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ONE STATISTICAL WORLD 


Earu LATHAM 
University of Minnesota* 


HE construction of a world political system has held the center 
Tee popular attention at the expense of other international develop- 
ments of importance but the work of building international organiza- 
tions also goes forward in many different subject matter and technical 
fields, less conspicuously and therefore less well known. While the 
dramatic issues of international politics command the stage, interna- 
tional institutions have been formed or are forming to deal with prob- 
lems of health, agriculture, trade, finance, refugees, employment, 
education, and culture. These agencies of international administration 
share a common need for reliable data, scientifically collected and as- 
sembled, and disseminated as widely as demand requires. 

Sound statistics are basic to the formulation and administration of 
international policies, and the organization of comprehensive statisti- 
cal services goes apace with the development of the international 
agencies they will serve. A world system is emerging from numerous 
independent and incomplete fragments of statistical activity: the 
United Nations Statistical Commissicn; the Statistical Division of the 
Department of Economic Affairs of the UN; the specialized inter- 
national agencies like the Food and Agriculture Organization; regional 
statistical organizations like the Inter American Statistical Institute; 
and private international statistical organizations like the International 
Statistical Institute. In addition to these world developments, parallel 
action is taking place within national statistical systems to order affairs 
in such a way that the responsibilities of membership in a world sta- 
tistical system can be discharged most effectively. In the United 
States this parallel action is centered on a new organization called the 
Federal Committee on International Statistics. These efforts to create 


* Formerly with the Division of Statistical Standards, Bureau of the Budget. 
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a world of statistics bear promise of creating new forms of professional 
cooperation which will reflect, institutionally, the true universality of 
statistical science. 

The hub of this new statistical universe is the United Nations, 
with reference to which the various segmented and specialized parts 
should assume the ordered arrangement of wheel to axle. To use this 
figure of speech in another way, the skill with which axle and wheel 
are joined may determine both the rate and smoothness of passage of 
UN affairs, if not indeed their direction and course. Whether these 
affairs follow the track of Phaeton or Apollo will depend in great 
part upon the success with which international statistical service plots 
the course of policy that neither freezes nor burns. 


UNITED NATIONS STATISTICAL COMMISSION 


Of the five elements of international statistical organization men- 
tioned, those of presiding importance are in the United Nations. The 
United Nations Statistical Commission comes first in point of time; 
and certainly not subordinate in measure of influence. 

At the time of the San Francisco Conference, when the United 
Nations Charter was drafted, some effort was made to draw attention 
to the desirability of establishing international statistical services in 
the United Nations. A conference of the principal statistical officials 
in the United States Government was called by the Bureau of the 
Budget to prepare and agree upon a statement of need for international 
organization in March 1945, with representatives in attendance also 
from certain accessible international organizations, including the 
Economic, Financial, and Transit Department of the League of Na- 
tions, the International Labor Office, the UN Interim Commission on 
Food and Agriculture and the Inter American Statistical Institute. This 
conference proposed to the Secretary of State that questions relating 
to international statistical organization be considered at San Fran- 
cisco but this was not done, and the United Nations Charter is not 
explicit on the subject. 

The Preparatory Commission of the United Nations then met in 
London in the autumn of 1945 and the conference of international 
statisticians made further proposals to the Department of State which 
were transmitted with its approval to the United States Delegate to 
the Preparatory Commission. The United States Delegation did some 
excellent staff work in the preparation of a paper titled, “Observations 
on Organization of Statistical Work of the Secretariat.” 

It had previously been recommended that a Statistical Commission 
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be established in the United Nations Economic and Social Council. 
The United States delegation in its “Observations” proposed further 
that the statistical services of the United Nations be grouped within 
the UN Secretariat in an Office of Economic and Social Affairs, directly 
under an Assistant Secretary-General having charge of that office. 
As the Secretariat of the United Nations was finally developed, it 
was decided to create a Department of Economic Affairs and a Depart- 
ment of Social Affairs in the Secretariat instead of one combined de- 
partment as had been recommended to the Preparatory Commission. 

Heeding the recommendation of the Preparatory Commission, the 
Economic and Social Council in February 1946 established the Sta- 
tistical Commission. By resolution adopted in June 1946 the Com- 
mission was charged with responsibilities for promoting the develop- 
ment of national statistics and improving their comparability; co- 
ordinating the statistical work of the specialized international agencies; 
developing the central statistical services of the Secretariat of the 
United Nations; promoting the improvement of statistics and sta- 
tistical methods; and advising the organs of the United Nations on 
general questions relating to the collection, interpretation, and dis- 
semination of statistical information. There is a close similarity between 
these functions and those vested in the Division of Statistical Standards 
of the Bureau of the Budget, a similarity which is not accidental since 
the example of the latter was freely used in the original draft state- 
ments of responsibility. 

After establishing the Statistical Commission, the Economic and 
Social Council appointed a “nucleus commission” of nine members serv- 
ing, not as nationals but as experts from the following countries: 
France, United Kingdom, India, United States, Norway, Brazil, 
China, the Ukrainian SSR and the USSR. The appointee from Brazil 
was not able to serve for reasons of ill health and the appointment re- 
served for the Ukraine was not acted upon. The nucleus commission 
met in New York in April and May of this year and elected as chair- 
man its United States member, Dr. Stuart A. Rice, Assistant Director 
in charge of Statistical Standards, Bureau of the Budget. The Com- 
mission considered and reported to the Economic and Social Council 
on its definitive composition and terms of reference. All of its recom- 
mendations were adopted by the Council except that which proposed 
that Commission members serve in their private capacities; the Council 
resolved that members should be representatives of their respective 


governments. 
In its permanent organization the Commission will contain not 
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more than twelve members, confirmed by the Economic and Social 
Council after nomination by member countries of the UN, and it will 
meet three times a year. It was the consensus of the Commission that 
regional statistical organizations should be encouraged and that the 
International Statistical Institute should be brought into some regular 
effective relation to the United Nations. Considerable attention was 
given to the statistical activities of the specialized international agen- 
cies and draft language was prepared for inclusion in the terms of 
agreement negotiated between the specialized agencies and the United 
Nations, covering the coordination of their statistical programs. The 
Commission voted to create a Sub-Commission on Sampling and the 
Council adopted its recommendation, authorizing the Commission 
itself to create the Sub-Commission, a departure from the ordinary 
procedure according to which the Council makes the appointments. 

The Statistical Commission set the pace for other commissions of the 
United Nations in the speed and efficiency with which it set about the 
tasks of organizing itself, and marked out the lines for the development 
of the United Nations statistical services. It has the great advantage 
of being located near the top of the UN hierarchy and possessed of a 
separate organizational life and identity. 


UNITED NATIONS SECRETARIAT 


Since the Statistical Commission is an advisory body and since it 
will not be in continuous session, it is clear that the day to day tasks 
of international statistical administration will have to be performed by 
the “statistical secretariat” of the UN, a short-hand way of referring 
to the Statistical Division of the Department of Economic Affairs of 
the United Nations. 

When the decision was made to create a Department of Economic 
Affairs and a Department of Social Affairs instead of a single Office of 
Economic and Social Affairs in the UN Secretariat, it then became ne- 
cessary to decide in which of the two the central statistical office should 
be located. As between the two departments, the case for lodgement in 
Economic Affairs was the more persuasive since it is expected that the 
chief statistical problems will be economic. Although more persuasive, 
the case was nevertheless weak; the better arrangement would have 
been to establish a single Department of Economic and Social Affairs, 
since the distinction between the two subject matter fields is frequently 
an abstract and artificial one, and since the work of both Departments 
will be related to one organ, the Economic and Social Council. 

The United States Delegation to the United Nations Preparatory 
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Commission in London proposed that the statistical secretariat be 
given substantial functions of control and coordination over statistical 
activities of the United Nations and its affiliates. It proposed that the 
secretariat be given responsibility for providing statistical information 
to the Secretary-General and the Assistant Secretary-General; validat- 
ing statistical findings used in UN proceedings; performing functions 
of coordination of statistical activities within the UN and among the 
specialized international agencies; collecting statistics from member 
governments and other international organizations; maintaining cer- 
tain statistical publications; collecting statistics, where feasible, for 
affiliated statistical organizations; recommending uniform standards 
for national and international statistics; consulting with and advising 
member governments and international organizations about statistical 
problems; and providing a general clearing house service for interna- 
tional statistical data. These recommendations were accepted by the 
Preparatory Commission as a valuable statement of the functions to 
be performed by the statistical secretariat. 

At the moment of writing, the statistical secretariat has not been 
organized, but the general lines of its development and relationship 
to the Statistical Commission can be discerned. The Statistical Com- 
mission is responsible for recommending the general statistical policies 
of the United Nations; in doing so it will be assisted by the statistical 
secretariat. In addition, the continuous tasks of coordination of the 
statistical activities of the United Nations, its organs, and its affiliated 
specialized agencies will be performed by the secretariat. In a real 
sense therefore it will act as the central UN control in the development 
of systematic international statistical organization. For example, sta- 
tistical data required by two or more international organizations may 
be collected by the central statistical secretariat, or may be collected 
by one of them designated by the secretariat on the recommendation 
of the Statistical Commission. It will probably have primary responsi- 
bility for collecting general purpose data of various kinds, like popula- 
tion and national income statistics, although it is doubtful whether it 
will actually gather much original data directly from respondents. It 
is more likely that its collecting function in the main will be one of 
assembling or collating data already in existence. There would, for 
example, be little point in having the statistical secretariat attempt 
to take the decennial census of the United States, even if it were 
constitutional or otherwise feasible for it to do so. Where general 
purpose statistical information is wanted from statistically developed 
countries, however, it is entirely possible that the central UN statistical 
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secretariat may undertake, with the cooperation of the country in- 
volved, to collect original data. 


SPECIALIZED AGENCIES 


The specialized international agencies are those that are or will 
be affiliated to the United Nations but not as an organic part of its 
structure. These organizations include: The International Bank for 
Economic Reconstruction and Development; the International Mone- 
tary Fund; the Food and Agriculture Organization; Provisional Civil 
Aviation Organization; United Nations Educational, Scientific, and 
Cultural Organization; and the International Labor Organization. In 
addition, the World Health Organization is being organized and plans 
are being made for an International Trade Organization and an Inter- 
tional Refugee Organization. Each of these specialized agencies exists 
independent of the formal organization of the United Nations but will 
be related to it according to the terms of agreements of affiliation to 
be made with the United Nations. In their entirety they present 
special problems of statistical coordination for the Commission and for 
the Statistical Division of the Department of Economic Affairs. 

The problem is not unlike that in the United States Government 
where numerous Federal agencies collect data in areas of their special 
competence and only the functions of coordination and the develop- 
ment of statistical standards are centralized. There is this difference, 
however: The central secretariat of the United Nations is expected to 
perform some functions of collection, as indicated. Otherwise the spe- 
cialized agencies will pursue their own statistical programs, which will 
be harmonized with those of other such agencies and with the UN 
statistical secretariat, on the advice of the Statistical Commission. 
These relations will be defined in the terms of agreement of affiliation. 
The Statistical Commission is deeply concerned with the terms of 
these agreements of affiliation, since the opportunity is thus afforded 
to set out the basic lines along which the statistical activities of the 
specialized agencies may be coordinated. 

The techniques of coordination are familiar, and presumably will 
be applied to this international statistical context. The basic pattern 
of relationships is decentralization. As already indicated, where two 
or more specialized agencies want the same kind of data, one of them 
may be designated as the focal agency with primary responsibility 
for its collection. Similarly other focal points may be designated for the 
receipt and distribution of data. Arrangements for the clearance of in- 
formation about statistical activities can be operated through the 
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secretariat and the Statistical Commission. For example, the spe- 
cialized agencies will be informed when there is any matter before the 
Commission in which they might be interested, so that they may at- 
tend and be heard. Their appearance at such deliberations is not a 
matter of right but of comity, since the Commission may want to 
discuss the statistical activities of the specialized agencies in executive 
session when some line of policy or decision is to be determined. More 
regular and routine arrangements for clearance can be managed 
through the secretariat, among such being the dissemination of re- 
ports and bulletins, circulation of announcements about events of 
interest and importance, meetings and conferences, and so on. A well 
devised reporting system that really communicates is an indispensable 
part of the clearance apparatus. 

The uniform language covering the statistical activities of the spe- 
cialized international agencies binds the UN and the agency to strive 
for maximum cooperation, the elimination of all undesirable duplica- 
tion between them, and the most efficient use of their technical per- 
sonnel in collection, analysis, publication, and dissemination of sta- 
tistical information by each of them respectively. The agency and the 
UN also agree to combine their efforts to secure the greatest possible 
usefulness and utilization of statistical information and to minimize 
the burdens placed upon national governments and other organizations 
from which such information may be collected.! 

The agency recognizes the United Nations as the central agency 
for the collection, analysis, publication, standardization and improve- 
ment of statistics serving the general purposes of international organi- 
zations. In turn, the United Nations recognizes the specialized inter- 
national agency as the appropriate agency for the collection, analysis, 
publication, standardization, and improvement of statistics within its 
special sphere, without prejudice to the right of the United Nations to 
concern itself with such statistics so far as they may be essential for 
its own purposes or for the improvement of statistics throughout the 
world. In consultation with the specialized agencies the UN assumes 
the responsibility to develop administrative instruments and proce- 
dures through which effective statistical cooperation may be secured 
between the United Nations and the agencies brought into relationship 
with it. The terms of agreement also contain an affirmation that the 
collection of statistical information should not be duplicated by the 


1 See for example the “Draft Agreement Between the United Nations and the Food and Agriculture 
Organization of the United Nations,” United Nations Economic and Social Council, Document E/57, 
June 10, 1946, Article XII, pp. 7-8. 
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United Nations or any of the specialized agencies whenever it is prac- 
ticable for any of them to utilize information or materials which an- 
other may have available. In order to build up a central collection of 
statistical information for general use, it is agreed that data supplied 
to the agency for incorporation in its basic statistical series or special 
reports should so far as practicable be made available to the United 
Nations. 

Some of the statistical needs of international administration have 
already been defined. For instance the Economic and Employment 
Commission has recommended to the Economic and Social Council 
that it will require firm data on national resources, human and ma- 
terial, such as plant capacity and labor force, if it is to do its work 
intelligently. The PICAO is holding conferences to determine how the 
cost of constructing thirteen weather stations in the North Atlantic 
should be distributed among the benefiting countries. In addition, it 
will need other statistical information on traffic, operations costs, and 
finance, if it is to do its work as an international agency effectively. It 
is certain that the World Health Organization will require numerous 
data on disease, nutrition, and medical services. The work of the Food 
and Agriculture Organization will require data on food supplies, crop 
acreage, food prices, crop forecasts, and agricultural methods. In more 
homely but very practical vein, the UN will need statistical informa- 
tion to fix the quotas of contribution that each nation will be expected 
to meet. If the UN salaries are paid in terms of net after taxes, system- 
atic information on taxes throughout the world will have to be ob- 
tained, for UN recruits its personnel throughout the world. All of these 
statistical considerations and concerns grow out of the operations of 
international administration. 

The Statistical Commission and the Statistical Division have an 
interest in all of these matters, as the list of delegated responsibilities 
discloses. The Commission will be interested in promoting the statisti- 
cal development of such countries as China, for instance, in order to 
provide more reliable data on population. It may want to make recom- 
mendations looking towards improving the comparability of national 
income estimates for tax and quota purposes. The Statistical Division 
and the Commission would undoubtedly take steps to see that the 
World Health Organization and the FAO use the same data on nutri- 
tion, to take a random example of possible duplication because of re- 
lated subject matter interests. The Commission will be interested in 
supporting the central secretariat in the development of special tech- 
niques for gathering and disseminating general purpose statistics. Of 
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use in this and other respects will be the Sub-Commission on Sampling 
which the Commission has been empowered to establish. At all points, 
the Commission will be in position to promote improved statistics and 
statistical methods by study and recommendation, and it will be avail- 
able to the organs of the United Nations for consultation and advice on 
general questions of statistics. 


THE LEAGUE OF NATIONS AND THE UN 


These three elements—United Nations Secretariat, the Statistical 
Commission, and the specialized international agencies—are the three 
sides of the new international statistical center which is being formed. 
They represent a considerable advance over the statistical arrange- 
ments which prevailed under the League of Nations.? In the main 
the League’s statistical work was based on the International Conven- 
tion Relating to Economic Statistics drafted by the International Con- 
ference Relating to Economic Statistics which met in Geneva in 1928. 
Occupying the place now filled by the Statistical Commission was the 
Committee of Statistical Experts which was set up under the Conven- 
tion to perform statistical services in major areas of economic interest. 
The Committee was assisted by the Secretariat of the League of Na- 
tions but the latter in no way dominated or even led the work of the 
Committee. It performed work on request of the League, its affiliates 
and on its own initiative. 

The Committee of Statistical Experts, however, did not meet for the 
first time until March 1931, some twelve years after the establishment 
of the League of Nations. There was no meeting in 1932 but thereafter 
there were annual meetings until 1939, when the outbreak of war 
brought them to an end. The meetings were of short duration, lasting 
not more than eight days. The statistical work of the Committee 
was carried on by sub-committees at the time of the annual sessions 
and, in the interim, through correspondence. Under these conditions, 
studies were completed on a minimum list of countries, to be distin- 
guished in foreign trade statistics; countries of provenance and 
destination in foreign trade; minimum list of commodities for in- 
ternational trade statistics; statistics of the gainfully occupied popu- 
lation; indices of industrial production; and on timber, housing, tourist 
and mining and metallurgical statistics. Some work was also done (the 

2 An account of the statistical and research activities of the League of Nations was prepared for 
the United Nations Statistical Commission by A. Rosenborg, Head of League of Nations Mission in the 
United States. This account draws attention to the study titled “The Economic and Financial Organ- 


ization of the League of Nations, a Survey of Twenty-five Years Experience,” written by Martin Hill, 
to be published shortly by the Carnegie Endowment for International Peace. 
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studies were never completed) on indexes of prices; financial statistics; 
balance of payments; and indices of quantum and prices in inter- 
national trade. Other statistical activities of the League involved 
health and drug control, communications and transport, and terri- 
tories and minority questions but were never very extensive in com- 
parison with the work described. 

Administrative coordination of all League statistical activity was 
operated through the Inter-Departmental Statistical Committee of 
the Secretariat, which also concerned itself with the standards of pres- 
entation of the relevant data published by the League, and which 
met at regular intervals under the chairmanship of the Head of the 
Economic Intelligence Service.* The secretary of the Committee of 
Statistical Experts served as the secretary of the Inter-Departmental 
Committee. The Chief of the Statistical Section of the International 
Labor Office participated in the meetings of the Committee which 
included within its purview questions of collaboration with the ILO. 
Other devices of coordination were the interlocking memberships of 
the ILO and the International Institute of Agriculture whose chiefs 
served ex officio as associated members of the Committee of Experts. 
Reciprocally, the Economic Intelligence Service was represented ex 
officio on the statistical expert committees of the ILO and the IIA. 

Certain weaknesses in this scheme of arrangements have been re- 
paired in the organization of the United Nations statistical services. 
First, the Statistical Commission will not do any of the actual work of 
collecting, assembling, publishing or disseminating statistics that was 
the responsibility of the Committee of Experts. This leaves the Com- 
mission free to devote itself to general problems of standards, policy 
and organization of statistical activity. Second, the Statistical Com- 
mission will meet three times as often as the Committee, assuring 
continuity of expert and disinterested attention to general policy 
matters. Third, the terms of relationship with the specialized inter- 
national agencies, statistics-wise, fix in fundamental procedure the 
basic pattern of a comprehensive statistical service. Fourth, the or- 
ganization of the United Nations’ major statistical development begins 
with the beginning of the United Nations itself, and not twelve years 
later. Fifth, the central statistical organization of the United Nations 
is comprehensive in conception and plan; the statistical services of the 
League of Nations were not developed in the same degree. Sixth, to 

* This paragraph is based on information contained in the account of A. Rosenborg, op. cit., and 3 
related account of the League of Nations Committee of Statistical Experts, prepared by E. Dana 


Durand, U. 8. Member of the Committee. Both these papers were submitted to tle United Nations 
Statistical Commission for its consideration in April 1946. 
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the extent that the United Nations is politically stronger than the 
League (if it is), the statistical services of the United Nations are likely 
to be invigorated and fortified by the prestige and power of the parent 
authority. 


REGIONAL STATISTICAL ORGANIZATIONS 


One of the notable developments in statistical administration in the 
last six years is regional statistical organization, the furthest advanced 
and the most mature being the Inter American Statistical Institute, 
organized in 1940 as a result of conversations held at the Washington 
Session of the Eighth Scientific American Congress in that year. The 
Statistical Commission has recommended as a policy that regional 
statistical organizations be encouraged, largely as a result of the ex- 
ample which the Inter American Statistical Institute (called IASI) 
supplies. A brief account of the IASI may help to make clear what it 
is that the Statistical Commission sponsors by its approval. 

The Twenty-fifth Session of the International Statistical Institute 
was originally scheduled for Washington in 1939. Upon the outbreak 
of war in Europe, the session was first postponed a year and then 
indefinitely deferred. At the suggestion of American members of the 
International Statistical Institute, a statistical section was added to 
the program of the Eighth American Scientific Congress. Members of 
the International Statistical Institute from four American nations met 
during the Congress to organize the IASI, with the principal objective 
of encouraging the development of statistical science and administra- 
tion throughout the Western Hemisphere. In particular, its program 
was aimed at improving the methods used in the collection, tabula- 
tion, analysis, and publication of both official and unofficial statistics, 
and at obtaining a greater degree of international comparability in 
these statistics. 

In the ensuing six years, the IASI has gained the official membership 
of almost all of the American governments, on whom it depends for 
support, and has come to play a very influential part in Western 
Hemisphere statistical affairs.‘ Consideration is being given to a pro- 

4 An illustration of this influence is the extent to which IASI members have been called upon to 
serve in various capacities in the United Nations. Of the seventy-eight “constituent” or professional 
members, three were designated as their governments’ representatives in the first session of the eighteen- 
member Economic and Social Council of the UN. These included the Council's Second Vice President 
Two others, the Institute’s President and First Vice President, were appointed as members of the nine 
member “nucleus” Statistical Commission and one of these was elected by the Commission as its Chair- 
man. Still another IASI member was recently designated by his government as the Ambassador of his 
country, Mexico, to the United States. Recommendations of the Institute’s Executive Committee, 


meeting in Rio de Janeiro in January 1946, have already in numerous instances been given positive effect 
by various American governments. 
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posal to make the IASI the statistical arm of the Pan American Union, 
to make it, in short, an “operating” agency through which the collective 
statistical interests of the American governments can be implemented. 
If these negotiations fulfill the promise they bear, the IASI will be- 
come the official statistical instrument of the Inter American System, 
created by the Act of Chapultepec and tacitly recognized in Chapter 
VIII (on “Regional Arrangements”) of the United Nations Charter.5 

Interest in regional statistical organization is manifest in other 
parts of the world also. For instance, statistical leaders in India are 
exploring the possibility of creating a similar organization for India 
and the adjoining countries. During the years of the war, the Middle 
East Statistical Bureau with headquarters in Cairo did much to foster 
and develop improved statistical activity in the Middle East. In 
Europe, the Emergency Economic Commission for Europe sketched 
the outlines of European statistical organization. There is similar in- 
terest in regional statistical organizations in East Africa and South 
Africa. Except for the Inter American Statistical Institute the move- 
ment for regional organization is exploratory and tentative, with some 
reasons for belief, however, that it will spread and accelerate. The chief 
problems are geographic and organizational, the latter being the more 
difficult. In these respects, the administrative problem that confronts 
the United Nations is different from that involved in the statistical 
activities of the specialized agencies, where the major problems of 
coordination are those of subject matter competence and jurisdiction. 

Two principal questions are raised by regional statistical organiza- 
tions: What should be their chief functions in a world statistical organi- 
zation; and what should be their relation to the United Nations on the 
one hand and to member countries on the other? 

On the first point, it may be said that much of the interest in the 
further development of regional statistical organization centers on the 
benefit expected from the performance of various service functions, 
especially in the promotion of statistical education and the facilitation 
of common programs. Such service activities would not contradict but, 
indeed, would foster and develop the general statistical aims of the 
United Nations. Regional organizations so oriented in purpose and pro- 
gram would fit into world statistical organization without disharmony. 

This then evokes the second principal question raised by regional 

5 The language of Articie 52, Section 1 of the Charter is as follows: “Nothing in the present Charter 
precludes the existence of regional arrangements or agencies for dealing with such matters relating to 
the maintenance of international peace and security as are appropriate for regional action, provided 


that such arrangements or agencies and their activities are consistent with the Purposes and Principles 
of the United Nations.” 
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statistical organizations. Some voice has been given to the view that the 
regional organizations might constitute an intermediate level between 
the United Nations and its affiliated specialized agencies and the mem- 
ber countries through which the international organizations would 
deal with member countries. There are forceful objections to this 
kind of arrangement. Excessive layers of authority between member 
nations and the central organs of the United Nations will tend to 
defeat that sense of participation in a world community which the 
United Nations Charter seeks to facilitate. It might happen that re- 
gional statistical organizations endowed with formal “line” authority 
would tend to displace the member nations as the basic elements 
making up a new world order. The losses in such an arrangement would 
fall not only upon the member nations deprived of direct relation with 
the central headquarters of the United Nations, but would prevent the 
latter from utilizing the full benefit of a strong central secretariat, the 
development of which would almost certainly be retarded if kept from 
direct contact with the principal producers of statistics—the member 
nations. The development and promotion of statistical education by 
the central statistical authorities would also tend to be handicapped if 
the latter had to function through intermediaries. Finally, the coverage 
of regional organizations is as yet confined to the Western Hemisphere. 
If regional statistical organizations were to be created universally to 
administer United Nations statistical programs, they would have to be 
created for that purpose by the United Nations. The strength of the 
IASI lies in its spontaneous and uncontrived evolution, out of the felt 
needs of the region it serves. Imposition of such an organization from 
above could create hazards of indifference or resistance which the 
United Nations cannot now afford, if indeed ever. The development of 
regional statistical organizations as service enterprises, however, rooted 
in the felt desires of the areas to be served, and devoted to statistical 
education in its many phases, promises interesting and significant de- 
velopments for statistical science and administration. 


PRIVATE INTERNATIONAL ORGANIZATIONS 


Article 71 of the United Nations Charter empowers the Economic 
and Social Council to make suitable arrangements for consultation with 
non-governmental organizations on matters within its competence. The 
Statistical Commission in its capacity as adviser to the Economic and 
Social Council declared itself to be keenly aware of the important 
contributions to the improvement of world statistics which have been 
made by the International Statistical Institute and other organizations 
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in this field and expressed its desire that recognition be accorded to 
their work. It hoped particularly that appropriate means could be de- 
vised to bring the International Statistical Institute into harmonious 
and mutually advantageous relationship with the United Nations. One 
of the ways to do so would be to proceed under the authority of Article 
71 of the Charter, although the Commission has deferred until a later 
time the formulation of recommendations as to specific methods by 
which such recognition might be expressed through the United Nations, 
and as to ways in which such organizations might be related to the 
United Nations and their activities utilized in fostering international 
cooperation in the improvement of statistics. The knitting of various 
private international statistical organizations into the world system is, 
therefore, a concern of the United Nations. 

The Statistical Commission directed attention to the Institute (some- 
times called ISI) because it is one of the oldest scientific organizations 
and because of its commanding reputation and influence in interna- 
tional statistical affairs. The ISI assembled for its last world conference 
in Prague in September 1938 but was forced by the imminent Nazi in- 
vasion to abandon its meeting in a hastily called midnight parley, 
without doing any business. Its last regular session was held therefore a 
decade ago in Athens; since then there has been no new election of 
officers nor other official business except that conducted through the 
Permanent Office at the Hague. Even this was interrupted by the war 
and the occupation of the Netherlands, which disrupted communica- 
tions with the outside world. Although the Institute is privately organ- 
ized and administered, having come into existence as the successor to 
the international statistical congresses initiated by the Belgian Que- 
telet, it was and is supported in part by subventions from the principal 
governments of the world, including the United States. The usual 
United States contribution was last received for the fiscal year 1939- 
1940. During the war, the appropriation was omitted from the budget 
estimates of the Department of State, but it has been restored in the 
estimates for fiscal year 1947. Plans and arrangements are being made 
for the Twenty-fifth Session of the Institute to be held in Washington 
September 13-25, 1947. There is general agreement that this session 
will be of the utmost importance, not only for the Institute but for the 
development of international statistical activities everywhere within the 
new world order. The Institute’s own organization must be repaired 
and revitalized after the attrition of ten years, and its future course 
must be redrawn and redirected. 

In the course of time the Permanent Office has tended to concen- 
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trate its attention upon demographic problems and the suggestion has 
been made that the United Nations might use the Permanent Office of 
the Institute for whatever work in demography the United Nations 
may require. Consideration of this course, however, will have to take 
account of the plans being made to establish a Demographic Commis- 
sion within the structure of the United Nations.* Even if these plans 
did not exist, it would probably be preferable if the collection and 
assembly of international statistical data were centralized in the 
formal institutions of the United Nations and its affiliated agencies. To 
separate and parcel out various subject matter fields of statistics would 
introduce an element of dispersion contrary to the synthesis which is 
forming around the United Nations. 

There are other and more feasible suggestions for the future role of 
the ISI in a world statistical order. Some members of the Statistical 
Commission have thought it might assume leadership in the develop- 
ment of world statistical systems. This is analogous to the activities 
of the Inter American Statistical Institute in the western hemisphere, 
which have proved very useful. The main techniques would presumably 
be those of advice, education, and promotion in statistically undeveloped 
areas of the world. Another proposal is that the Institute maintain a world 
statistical research center at the site of its Permanent Office, a place at 
which the frontiers of statistical research would be explored. Under this 
proposal, arrangements could be made to provide world statistical 
scholars with leaves of absence to serve as directors of research in resi- 
dence, for fixed terms. Such a role would provide the Permanent Office 
with a program of continuing responsibilities, shifting its activities 
from those of a casual and episodic type, to those permitting long range 
planning and development. Still another proposal for the Institute 
would utilize it as a “supreme court” of statistics, an international 
academy, to which various questions of professional competence would 
be referred for judgment and opinion by the United Nations and mem- 
ber nations, very much as the American Statistical Association is occa- 
sionally called upon by agencies of the United States Government to 
study and make recommendations upon statistical questions. Finally, 


6 The United Nations Preparatory Commission recommended to the Economic and Social Council 
that the latter should consider “the desirability of establishing at an early date, and possibly at its first 
session” three commissions of which one was to be a Demographic Commission. The functions suggested 
for the proposed Demographic Commission were those of study and advice to the Council on matters 
related to: (1) population growth and the factors determining it: (2) the effectiveness of policies designed 
to influence these factors: (3) the bearing of population changes on economic and social conditions; 
and (4) general population and immigration questions. The matter was referred to the Social Committee 
of the Economic and Social Council (a drafting committee) which has not yet made its recommenda- 
tions. See “Report of the Preparatory Commission of the United Nations,” 1945, pp. 28, 38. 
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it has been suggested that the Institute might perform services for the 
improvement of the work and methods of the statistical fraternity much 
as the American Bar Association looks after the interests of the legal 
craft and seeks to improve its professional standing and practices. 

There may be various other possibilities and the proposals made are 
hardly more than suggestions of alternative lines of development that 
have been mentioned by those interested in the problem that confronts 
the Institute. As an independent private group, the Institute will itself 
have to decide upon the role it wants to follow in the future: the United 
Nations will not dictate the course it takes. The United Nations, 
however, would welcome the affiliation of the Institute in some useful 
and practicable permanent relationship; and there seems to be ample 
room for a negotiation of terms satisfactory to both institutions. A 
meeting of the Statistical Commission will be held at about the same 
time as the scheduled Twenty-fifth Session of the Institute in Washing- 
ton. By that time, specific proposals of agreement may have been 
worked out for consideration by both the Commission and the Insti- 
tute, and it may then be possible to effect an agreeable arrangement 
fixing the Institute’s permanent role and future relationship in the 
world statistical system. The extent to which other international sta- 
tistical organizations are brought within the conspectus of the United 
Nations is a question upon which it is difficult yet to make any judg- 
ment; but the doors of the United Nations have been opened wide by 
the Statistical Commission. 


NATIONAL STATISTICAL SYSTEMS 


Action parallel to the formation of an international statistical system 
will be required in many countries, to order arrangements in such a 
way that maximum participation in international statistical activities 
will result. In countries which are statistically undeveloped, the or- 
ganization of a basic national statistical system will be necessary, per- 
haps with the help and advice of the statistical services of the United 
Nations, its organs, or affiliates. Even in statistically developed coun- 
tries, it will probably be necessary to designate focal points for dealing 
with the international organizations. In countries whose statistical 
services are highly centralized, the focal point ordinarily will be the 
central statistical bureau. In countries with decentralized statistical 
services, some focal point will have to be organized. In the United 
States, the statistical agencies operate under conditions of decentrali- 
zation; it will be necessary therefore to create a clearing house for 
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dealings with international statistical agencies and steps to do so are 
going forward. 

The first step was taken over a year ago when the conference of 
Federal statistical officials was called to consider the preparation of a 
statement of the need for international statistical services for use by 
the United States delegation at the San Francisco Conference. More 
recently, further steps have been taken to make a permanent organiza- 
tion out of the conference. On March 21 of this year, the same group 
of officials met at the invitation of the Division of Statistical Standards 
of the Bureau of the Budget to organize themselves as the Federal 
Committee on International Statistics. The membership of this com- 
mittee is drawn from the principal agencies having important inter- 
national statistical interests, including State, Treasury, Agriculture, 
Commerce, Labor, Export-Import Bank, Securities and Exchange 
Commisssion, United States Public Health Service, Office of Education, 
United States Maritime Commission, United States Tariff Commission, 
Federal Reserve System, Social Security Board, and the Civil Aero- 
nautics Board. One of its first undertakings was the preparation of 
an inventory of international statistical activities carried on by the 
Federal Government. 

The chief concerns of the Federal Committee on International Statis- 
tics will be the clearance of demand and supply arrangements with 
international bodies for the receipt and supply of data; the coordina- 
tion of the international statistical requirements of the United States 
Government; the avoidance of duplication in the collection of statistics 
required by the United Nations, its organs and affiliates; and the clear- 
ance of intern arrangements and the loan of experts. Within the frame- 
work of coordination by the Committee, there will be every encourage- 
ment for direct contacts between the statistical technicians and spe- 
cialists in the United States and their colleagues in other parts of the 
world. Full utilization of the statistical resources of the United States 
will be fostered through direct dealings with international organiza- 
tions, with the knowledge and guidance of the Federal Committee on 
International Statistics. 

Three specific instances of the way in which the Federal Committee 
will operate can be cited. First, the recommendation of the Economic 
Employment Commission already mentioned will, if it is approved by 
the Economic and Social Council, require the assistance of several 
Federal agencies in the collection or assembly of complete data on plant 
capacity and labor force. How should this assignment be distributed 
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among the Federal statistical family? It would seem evident that some 
central action may be necessary, and this action the Federal Committee 
would be in a position to supply; although it would not dictate the 
distribution, the distribution would be made with its assistance and 
certainly with its knowledge. Second, the inventory of Federal inter- 
national statistical activities which is being made at the suggestion of 
the Federal Committee will lay the foundation for the designation of 
focal points among the Federal agencies for dealing with certain kinds 
of international statistical business. It will also make it possible to 
estimate the amount, kind, and incidence of international statistical 
business and the responsibilities which participation in the world of 
statistical system imposes. Third, the Federal Committee will be avail- 
able as a starting and reporting point for missions of United States 
officials abroad on statistical business. In this fashion, traveling officials 
will have the benefit of the advice and counsel of the Federal statistical 
family, will be able to represent the views of that group on proper occa- 
sions, and will be able to perform various kinds of service abroad of 
benefit to American statisticians, such as getting in touch with those 
whose activities and whereabouts are in doubt. 

These, at least, are the plans. The agencies of the United States 
Government have a stake in the success of world statistical organiza- 
tion. Although the committee device is no sovereign cure for the risks 
of administrative dispersion and diffusion, the organization of the 
Federal Committee on International Statistics may help the attain- 
ment of statistical unity in international affairs by promoting it at 
home. Because the Commission is sponsored by the Division of Statis- 
tical Standards of the United States Bureau of the Budget, its activities 
become part of the general program for the coordination and improve- 
ment of statistics among all Federal agencies, for which the Division has 
been made responsible. Under these auspices, the statistical world 
abroad joins the statistical world at home in common purpose and 
program. 









































OBJECTIVES, USES AND TYPES OF LABOR FORCE DATA 
IN RELATION TO ECONOMIC POLICY* 


Louis J. Ducorr AND MARGARET JARMAN HaGoop 
Bureau of Agricultural Economics 


The kinds of labor force statistics developed should be de- 
termined by their uses. The full employment goal of national 
economic policy in the United States gives rise to the need for 
labor force statistics which can serve as a barometer of the 
state of functioning of the economy and which can identify 
the sectors of the Nation’s workers for whom the economy is 
not providing full employment opportunity. This treatment is 
primarily of the implications of such objectives for the types of 
labor force data needed, although it is recognized that labor 
force data have many other uses. 

Labor force statistics include data obtained from employing 
establishments, registrations, and population surveys, with 
each type having advantage for certain types of uses. Labor 
force data from population surveys need to be expanded to 
provide differentiated categories of the unemployed and to 
identify those employed workers whose employment is inade- 
quate because it is insufficient in amount or is remunerated at 
substandard rates. The development of these and further 
geographic differentiations would increase the utility of labor 
force data in relation to economic policy. 


HE USES to be made of any statistical series have important impli- 
T vations for the types and nature of data which should be obtained. 
The uses determine the sources, concepts, differentiations, and fre- 
quency of issuance of counts or estimates of various types of economic 
phenomena. Because the labor market activity of persons of working 
age affects every sector of the economy, labor force statistics have a 
wide variety of uses. In determining the concepts and differentiations 
that underlie the labor force data currently obtained, many types of 
uses need to be taken into account. 

In this paper we will indicate some major objectives of labor force 
data and the ways in which different types of labor force data serve 
these objectives. We will then discuss underlying concepts and certain 
developments which need to occur to broaden the uses of labor force 
statistics. In covering these subjects, the full employment goal of na- 
tional economic policy in the United States will be singled out for spe- 
cial emphasis, although it should be recognized that the needed de- 

* This paper and the papers by Gertrude Bancroft and Emmett H. Welch, and Charles Stewart 
and Loring Wood, which follow it immediately, were planned as a unit and were presented at the 


session on Labor Force Measurement and National Employment Policy at the 105th Annual 
Meeting cf the American Statistical Association at Cleveland, Ohio, January 24, 1946. 
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velopments in labor force statistics are not at all dependent upon the 
inauguration of any full employment policies. 


I, OBJECTIVES OF LABOR FORCE, EMPLOYMENT AND UNEMPLOY- 
MENT DATA IN RELATION TO ECONOMIC POLICY 


A major objective of labor force statistics is to serve as a barometer 
which indicates currently the state of functioning of the economy as this 
is manifested in levels of employment and unemployment of the Na- 
tion’s workers. Other types of statistics on production, income, etc., 
also indicate the general state of the economy, but no other type re- 
flects as directly the changes in work opportunities. Labor force sta- 
tistics are obviously required in planning or appraising general economic 
and labor policy, as well as specific measures designed to implement 
such policy. In this function these statistics serve not only government 
agencies, but also management, labor, and various groups with diverse 
interests in economic and labor policy. 

Another objective of labor force statistics is to aid in diagnosing a 
given national economic situation through identification of the sectors 
of the Nation’s workers for whom the economy is not providing full 
employment opportunity. In times of less than full employment, it is 
not enough to know simply the total number of unemployed workers. 
To diagnose the situation in order to appraise the steps proposed for 
remedying it, much more detail is needed. The location and character- 
istics of the wholly unemployed are needed, as well as their aggregate 
volume. Among those who have some employment, identification is 
needed of underemployed workers—that is, the partially unemployed. 
There is also the need to identify among the employed those who are 
remunerated at rates below some minimum level of adequacy implied 
in the goal or standard of “full employment opportunity.” 

Changes in the level of functioning of the economy also should be 
diagnosed according to the industries which are expanding or curtailing 
employment. Employment statistics may be viewed both as a measure 
of opportunities afforded workers, and as a measure of the state of func- 
tioning of various industrial sectors of the economy. Employment data 
by industries are especially needed for the planning of specific policies 
and programs relating to particular industries. 

In the formulation or appraisal of economic policy labor force sta- 
tistics serve another objective. They supply the most important part of 
the quantitative basis for making labor force projections into the fu- 
ture, under various assumptions as to the state of functioning of the 
economy. The projections made under the assumption of full employ- 
ment have especial usefulness. On the one hand, they can provide a 
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more explicit formulation of what the goal of peacetime full employ- 
ment would mean in terms of number of jobs to be provided. On the 
other hand, full employment projections can serve as a standard against 
which a given current national economic situation can be diagnostically 
appraised. 

In summary, therefore, labor force data must be designed to indicate 
the state of functioning of the economy in providing employment op- 
portunities, to identify and measure the groups of workers for whom 
full employment has not been obtained, and to help formulate national 
employment goals. 


II. TYPES OF LABOR FORCE, EMPLOYMENT AND UNEMPLOYMENT DATA 
REQUIRED FOR VARIOUS USES 


Sources of employment and unemployment data. The term “labor force 
data” in a broad sense includes all types of data, however derived, on 
employment and unemployment or the total labor force. In technical 
usage, the term “labor force data” is restricted to data from only one 
of the three principal sources of data on employment and unemploy- 
ment. In this narrower sense, statistics developed from classification 
of the population of working age according to their current labor mar- 
ket status are designated as “labor force statistics,” or sometimes as 
population classification statistics. Determination of the current labor 
market status of individuals provides the basis for estimates or counts 
of the total labor force, its employed and unemployed components, and 
of major categories of the remaining nonworker population. Labor force 
statistics of this type were obtained in the 1940 Population Census for 
the entire population of working age, and have been collected on a sam- 
ple basis since then in surveys known as the Monthly Report on the 
Labor Force, initiated by WPA under the direction of Howard B. 
Myers, and transferred to the Bureau of the Census in 1942. 

A second source of current employment data is the reports of estab- 
lishments employing workers. Reports may be made on a voluntary 
basis, as in the case of nonagricultural establishments which report 
the number of their employees to the Bureau of Labor Statistics, and of 
farmers known as crop reporters who report their farm employment to 
the Bureau of Agricultural Economics. Reports may be legally required 
from establishments meeting certain criteria, as in the case of reporting 
required from employers in the administration of Old Age and Survivor’s 
Insurance and Unemployment Compensation programs. The biennial 
Censuses of Business and of Manufactures which were interrupted 


1 The importance of the difference between voluntary and legally required reporting rests on the 
fact that the former is subject to response bias while the latter is not. 
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during the war also obtained employment data through reports from 
employers. Employment data secured in these ways are known as es- 
tablishment reports as distinguished from the type of employment 
data obtained from population classification in labor force surveys. 

In addition to population surveys, another source of unemployment 
data is the registration of individuals who file claims for unemployment 
compensation at local offices of each State’s Unemployment Compen- 
sation Agency. Compilation of these registrations permits a national 
total for workers who are claiming these benefits. 

Uses served by the various types. Each of these three types of data 
can serve important uses. Labor force data derived from population 
surveys have the following advantages which make them superior for 
certain uses. 

1. They provide the only source for estimates of the total labor force, 
for total employment and for total unemployment, without dupli- 
cation between industries within the employed group, or between 
the employed and unemployed classification. Thus for a current 
cross-section view of utilization of the Nation’s manpower, or 
for the record of change in the total labor force and its com- 
ponents, labor force data derived from population surveys pro- 
vide the only complete picture. For these uses, it is essential that 
the estimates of the various components of the labor force be 
additive. 

2. They provide the only source for deriving labor force participa- 
tion rates—that is, the proportion of each age-sex group who are 
engaged in labor market activity. Labor force participation rates 
are used in current analyses and are the starting point in any full- 
employment projection for obtaining estimates of the number of 
workers for whom jobs must be provided at some given future 
date. Labor force statistics obtained from a population classifica- 
tion are therefore essential for the setting of full-employment goals. 

3. Labor force data provide the possibility of subclassifications 
within the employed, unemployed and nonworker categories 
which can usually be made only when the report is obtained on an 
individual basis rather than from an employer reporting for a 
group of workers. Some of the important items now secured for 
further differentiation are age, sex, veterans’ status, industry, oc- 
cupation and time worked during the reporting week. Other items 
are proposed later for further differentiation of the employed and 
unemployed which would enhance the diagnostic function of these 
statistics. 

On the other hand, establishment reported statistics on employment 
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have certain advantages over those derived from population surveys, 
which make them superior for certain uses. 


1. 


Because they are obtained for relatively large groups of workers 
in any one report and usually by mail, establishment reported sta- 
tistics are less expensive than those obtained from population 
enumeration. Hence, they can be obtained for finer geographic 
and industrial breaks without incurring prohibitive costs. Al- 
though current employment estimates for regions, larger states 
and the more important metropolitan areas could be obtained 
through the expansion of current labor force surveys proposed by 
the Bureau of the Census, establishment-reported statistics on 
employment are now the only type available in some geographic 
detail, except where surveys have been conducted for individual 
cities. In general, therefore, analyses of current employment re- 
quiring geographic detail now necessarily have to be based on 
establishment-reported data. 

Establishment-reported data provide superior industrial classifi- 
vations of the employed because: (a) the reports on employment 
are accompanied by other information which permits an accurate 
classification of the establishment by standard industry group- 
ings, (b) the system of collection is by industry which makes for 
better representation of industries than in the case of population 
surveys in which other criteria are the major basis of stratifica- 
tion, (c) because of the first major advantage mentioned, namely 
inexpensiveness, the separate industries are more adequately 
covered than in population surveys. Therefore, establishment re- 
ported statistics have definite advantages in employment analyses 
for specific industries. 

Establishments reporting employment data can and often do sup- 
ply related information on production, man-hours worked, and 
wages which permit productivity and wage analyses by indus- 
tries.2 Although data on wage earnings have been occasionally 
secured in population surveys, those reported by employers are 
doubtless more accurate generally, since their payrolls are a mat- 
ter of record. However, they have limitations for certain purposes 
since they do not represent total earnings of individual workers in 
many cases. 


The unemployment data derived from registration of claims for un- 
employment compensation have present and potential uses. At present, 


2 Man-hours data for a given industry as reported by establishments are more accurate because in 
population surveys the total time worked during the reporting period by individuals who hold more 
than one job is all allotted to the one industry in which greater time was worked. 
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their usefulness is limited because not all types of employment are cov- 
ered by the legislation, because States vary in their requirements for 
eligibility, and because compilations at the national level are therefore 
somewhat noncomparable and difficult to interpret. On the other hand, 
if these difficulties were overcome, there would become available unem- 
ployment data with a very considerable amount of geographic detail. 

Reference is made in the last paper on this program* to specific 
examples of the uses of population and establishment reported statis- 
tics. The remainder of this paper is devoted to consideration of what 
the uses of labor force statistics in relation to economic policy imply 
for the concepts and classifications to be used in labor force surveys 
involving population classification. 


III. IMPLICATIONS OF USES FOR BASIC CONCEPTS IN 
LABOR FORCE SURVEYS 


One implication of the uses of employment, unemployment and labor 
force data in relation to economic policy is that the basic concepts must 
be designed for reflecting changes in the national employment scene. 
This means that the concepts of employment and unemployment must 
relate to a relatively short time period and that the measurement proc- 
ess must be made at frequent intervals. Both the 1940 Population Cen- 
sus and the MRLF surveys afford labor force statistics geared to ac- 
tivity or status during a specified current week. The use of labor force 
statistics to indicate short-time as well as long-time changes requires 
that the basic concepts of employment and unemployment should re- 
late to a specified short-time period rather than to a “usual” status, 
such as in the “gainful worker” figures of the decennial censuses prior 
to 1940.’ It is recognized, however, that for certain purposes, classifica- 
tions based on longer time periods may be preferable, or at least desira- 
ble for supplementation. 

Another implication of the uses of labor force data for the concepts is 
that they must provide for a classification of all persons of working age 
into labor force status groups which are mutually exclusive, so that the 
resulting estimates will be additive. This requires carefully specified 
priorities of status for those persons who have dual status during the re- 
porting week. The priorities of status in current labor force measure- 
ment are: (1) at work, (2) unemployed, (3) with a job but not at work, 
and (4) nonworker status. Those at work and those with a job but not 

* See page 313. 

3 Since the monthly labor force surveys of the Bureau of the Census are the only source of current 
national statistics derived from labor force classification of individuals, the rest of this paper refers 
primarily to the labor force statistics developed from the MRLF surveys. However, many of the mat- 


ters discussed are generic to any labor force statistics, whether gathered for a city or some other area, 
as well as for the country as a whole, and whether gathered monthly, yearly, or in a one-time survey. 
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at work are added to get the total employed. In dual status cases, the 
person is classified in the category with higher priority. For example, 
persons who were at work for some part of the week, but unemployed 
and seeking work during an even larger part of the week are classified as 
employed. If a person spent most of the week in a nonworker status 
such as doing housework at home or going to school, but did gainful 
work during some part of the week, he or she is also classified as em- 
ployed. A person whose major time is spent in some nonworker activity, 
or a person who has a job at which he did not work but who reported 
looking for work during the week is classified as unemployed. 

The uses of labor force data required that some set of priorities be 
adopted which could be precisely defined and put into effect with a fair 
degree of uniformity. They require further that borderline groups be 
expressly allotted to one status or another so that they too will be 
treated uniformly. Consideration is being given to the possibility of 
changing the allocation of certain borderline groups. Certain groups of 
workers now classified as “with a job but not at work” are being con- 
sidered for inclusion among the “unemployed,” which has a higher pri- 
ority. Persons on lay-offs of less than 30 days and certain other nu- 
merically small groups who did not report looking for work have been 
classified as “with a job” and included among the employed. Because 
they were “involuntarily idle,” however, consideration is being given 
to including them with the unemployed. On the other hand, persons 
without a job who report that they were not looking for work because 
of temporary sickness or because they believed no work available are 
classified as unemployed, although the case could be made that some 
of these should be treated as nonworkers. 

To reflect accurately changes in employment and unemployment, the 
criteria must be adapted to distinguish the employed, the unemployed 
and nonworkers in periods of varying economic conditions. Because 
different national situations call forth different types and degrees of 
labor market activity on the part of individuals, the general principle 
observed in the development of labor force concepts has been toward 
making the “in labor force” category as inclusive as possible. This calls 
for introducing further differentiations according to the degree of par- 
ticipation so as to permit measuring changes for comparable groups. 


IV. IMPLICATIONS OF USES FOR DIFFERENTIATIONS WITHIN 
THE EMPLOYED AND UNEMPLOYED COMPONENTS OF 
THE LABOR FORCE 


The barometric, diagnostic, and projection uses of labor force sta- 
tistics have gained importance as the idea of full employment became 
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more widely accepted as a goal for the Nation’s peacetime economy. 
Moreover, the growth of the idea of full employment has meant that 
further differentiations of the employed and unemployed are required 
to serve the uses described. 

The virtual elimination of unemployment during wartime raised the 
challenge to eliminate all but frictional unemployment in peacetime, 
and “full employment” was defined as a situation with unemployment 
at a minimum or frictional level. But this naturally led to a comple- 
mentary definition in terms of the number of jobs needed to keep un- 
employment to the specified minimum level. Projecting the number of 
jobs required for full employment rests primarily on projecting the 
number of persons who will be in the labor force. Such projections take 
as a starting point the past record of labor force participation of the 
different age-sex groups of the population under varying economic con- 
ditions. Thus differentiations of labor force statistics by age, sex, 
marital status, etc., are needed as a primary basis for the labor force 
projections which make explicit what full employment goals mean in 
terms of the number of workers to be supplied with jobs. 

The growth of the idea of peacetime full employment did not stop 
with the mere specification of a sufficient number of jobs. More re- 
cently, various writers and groups have added criteria as to the kinds 
of jobs consistent with the idea of full employment. Full employment 
as a goal means, at least for some, a sufficient number of productive 
jobs, adequate in the regularity and amount of work afforded, and pro- 
viding adequate remuneration. 

Such an expanded goal sets new problems in determining and de- 
veloping the differentiations within the employed and unemployed 
which would best serve the function of indicating the extent and na- 
ture of departure from the goal which a given national employment 
situation represents. The problems of full employment projections also 
require certain types of labor force data which have not been previously 
available. Many of the needs could be met by further classifications 
of the employed and unemployed according to data now being obtained. 
Other needs would require additional information which could be ob- 
tained in current labor force surveys, particularly if the size of the 
sample were somewhat increased. 

Perhaps the most important additional differentiation needed in cur- 
rent labor force statistics is geographic. Especially if labor force statis- 
tics are to serve a diagnostic function, they must indicate where the 
trouble spots are. Complete geographic identification of the individuals 
covered by the MRLF surveys is already obtained on the schedule. 
But an expansion of the scope of the survey operations would be re- 
quired to provide samples large enough to permit valid estimates of 
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current employment and unemployment for regions, by residence 
groups of the population, for metropolitan cities, and for States. 

Differentiations needed among the employed. The most important need 
for additional differentiation among the employed is to separate those 
workers whose employment is sufficient and remunerative from those 
whose employment fails to meet these criteria. In a fuller treatment of 
this problem, we have for want of accepted terminology referred to the 
first group as “adequately employed” and to the second group as “in- 
adequately employed.” Measurement of the inadequately employed 
may be as important as measurement of the unemployed for indicating 
the degree to which a given national situation falls short of meeting full 
employment goals. 

Measurement of the inadequately employed would involve the de- 
velopment of labor force survey techniques for identifying among the 
employed: (a) those who do not have a sufficient amount of work, the 
underemployed, and (b) those who get substandard returns per hour 
of work because of its low productivity (mainly self-employed or un- 
paid family workers) or because they are working for substandard 
wages. In appraising the state of functioning of the economy at any 
given time, the size of these two groups as well as the number of wholly 
unemployed needs to be known to indicate the total number of workers 
for whom full employment has not been attained. In times of depressed 
economic conditions especially, the presence of large numbers of under- 
employed among those classified as employed would need to be recog- 
nized and estimated separately from the voluntarily part-time workers. 
Even more continuously, in times of high as well as low employment 
levels, is there need for measuring the number of workers who have a 
sufficient amount of work but who are inadequately remunerated. Es- 
pecially as the nation approaches peacetime full employment goals in 
terms of number of jobs will attention need to be directed toward pro- 
viding employment data which identify the groups of workers whose 
jobs or enterprises do not meet minimum standards of adequacy. 

Differentiations needed among the unemployed. During the past, the 
total number of unemployed persons has been relied upon as an im- 
portant indicator of the state of functioning of the economy. In the 
present transition period from war to peace unemployment may be 
of relatively short duration for most of the workers involved. A sub- 
stantial proportion of the unemployed may be extra workers who 
came into the labor force in response to the economic situation brought 
about by war and who are in a state of indecision as to whether to 


4 “Labor Force Definitions and Measurement in Relation to Employment and Income Levels,” 
preliminary draft of a report prepared for the Subcommittee on Labor Statistics of the Labor Market 
Research Committee of the Social Science Research Council, November 1945. 
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continue gainful work or to withdraw from the labor force. A few 
years later, the total number of unemployed workers may be of a 
very different composition. To interpret the meaning of a given na- 
tional level of unemployment, it will be necessary to differentiate 
the total according to the degree of the worker’s attachment to the 
labor force and duration of unemployment, in addition to certain sub- 
classifications such as age, sex, marital status, residence and occu- 
pational skills. 

The following four classes of the unemployed are suggested as the 
types of differentiations needed in using unemployment data in diag- 
nosing a given national employment situation: 

1. Unemployed persons who are customarily members of the labor 

force and who are seeking full-time jobs; 

2. Unemployed persons who are customarily members of the labor 
force but who are seeking only part-time employment; 

3. Unemployed persons who have never had jobs but who are seek- 
ing to enter the labor force on a permanent basis; 

4. Unemployed persons who have not customarily been in the labor 
force and who are seeking only temporary work because of some 
special need or because of unemployment of other breadwinners. 

In addition to these classes, certain groups of the unemployed not 
readily identifiable need especial study. These include persons who 
would like to have regular work but who because of age, lack of occu- 
pational skill, physical or mental handicaps or discriminatory labor 
practices are considered unemployable. 

Problems in obtaining differentiations needed. Some of the information 
which would be required for obtaining the differentiations needed of 
the employed and unemployed is already being obtained in the current 
MRLF surveys, but the size of the sample does not permit full utiliza- 
tion of it in cross-classifications. Obtaining other information would 
require special questions to be added to the schedule. Some of the 
questions would involve types of information differing from the type 
now obtained with respect to subject, for example, wages received dur- 
ing week—or with respect to time reference, for instance, major work 
status during last 12 months. While the data obtained should be as 
objective as possible, some questions of a more subjective nature than 
those now used might be required for identifying certain groups—for 
example, a question on desire for more work to identify the under- 
employed. Devising of schedule and survey techniques for achieving 
these differentiations poses challenging problems in the field of labor 
force measurement. The next paper will deal with some of these meas- 
urement problems. 
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RECENT EXPERIENCE WITH PROBLEMS OF 
LABOR FORCE MEASUREMENT 


GERTRUDE BANCROFT AND EmMMetT H. WELCH 
Bureau of the Census 


Different uses for labor force data often require different 
concepts and definitions which are sometimes difficult to rec- 
oncile. It is important to develop, as a part of measurement, 
procedures that will provide data which are suitable for the 
various uses and also comparable over time. The experience 
of the Census Bureau in the operation of its monthly sample 
survey during the last few years has revealed many of the 
problems attendant upon putting into actual practice the 
agreed upon concepts. 

In a recurring enumeration of a sample of households, ques- 
tions on which classifications are based must be such that they 
can be asked repeatedly. They must be simple and objective 
and not place too great a strain on the respondent’s memory. 
The basic classifications finally set up must be large enough 
to be determined fairly reliable by the sample. 


HE PRECEDING paper outlined some of the uses and purposes to be 
{ gate by labor force data and indicated additional concepts and 
classifications of data needed. Recent experience in measuring and 
classifying the population in accordance with present labor force con- 
cepts indicates a number of problems that would have to be dealt with 
in expanding the pattern of concepts and classifications used. The first 
difficulty in obtaining adequate labor force measurements is to reach 
agreement on concepts and definitions of what is to be measured. Often 
different uses require different concepts and definitions. For example, 
if one is using unemployment figures to represent available labor sup- 
ply, persons with a job but not working and not looking for work would 
not be included among the unemployed. On the other hand, if one is 
using unemployment figures to represent the interruption or lack of 
earnings, it might seem desirable to add to the unemployed those with 
a job but temporarily not working because of lay-off, strike, bad 
weather, etc. Present labor force measurements classify separately 
persons having jobs but not working for various reasons. It has been 
suggested that such persons might be classified as unemployed at one 
time and as employed at another, depending upon the purposes to be 
served by the data. This immediately introduces a second difficulty, 
namely, that it is not practicable to have two different unemploy- 
ment series. A possible solution is not to have one unemployment 
series at all but to have a series representing persons looking for 
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work, another series representing persons with a job and not looking 
for work, etc. This solution would not please those persons who wish 
to have some simple, generally accepted term such as unemployment. 
The practice which has been followed up to the present and one which 
we personally believe may be found reasonably satisfactory for the 
future is to define unemployment in such a way as to be most generally 
useful and then to provide subclassifications of employment and un- 
employment that will enable other groupings to be prepared for spe- 
cial purposes. This approach seems desirable for another reason. It pro- 
vides some flexibility in concept while providing continuity and com- 
parability over time. 

After a set of concepts and definitions acceptable to the users of the 
data has been obtained, another and more serious difficulty is to insure 
that the concepts and definitions that are agreed on are actually trans- 
lated into measurement. The concept of employment in current use, 
for example, includes all persons engaged in some activity for pay or 
profit. This includes persons working for wages or salary, employers, 
own account workers, and persons working on a family farm or in a 
family business. It includes persons working long hours, those working 
short hours, and those with a job at which they are not currently 
working because of vacation, illness, bad weather, strike, or temporary 
lay-off. 

Many persons have preconceived ideas of employment or unemploy- 
ment which differ considerably from the concepts and definitions in 
terms of which measurements are being sought. The problem of obtain- 
ing an accurate measurement of labor force concepts by means of popu- 
lation enumeration and classification is to devise schedule questions and 
definitions which will insure that persons are classified in accordance 
with the desired concepts and definitions rather than in accordance 
with the respondent’s ideas as to labor force classification. 

It is to be expected that there will always be a considerable variation 
of response in the labor force classification of individuals, arising from 
difference in enumerator and difference in respondent. These variations 
in response become serious if groups of persons with certain charac- 
teristics tend to be persistently misclassified according to the concepts 
and definitions being applied. The experience we have had in measuring 
employment provides a rather striking illustration of this problem. 

Since its inception, the Monthly Report on the Labor Force has at- 
tempted to achieve objective measurement of employment in terms of 
all persons who, during the census week, did any work for pay or profit, 
including those who worked on a family farm or business without 
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specific wages. During the past several years considerable evidence 
has accumulated to indicate that the enumeration of employment in 
the 1940 Census of Population and in the Monthly Report on the 
Labor Force, or the MRLF, had been incomplete because of: first, 
a failure to include as employed a considerable number of persons such 
as housewives and students who did not consider themselves primarily 
as workers and second, a failure to include large numbers of unpaid 
family workers. 

Under-enumeration of employment by the 1940 Census of Population 
and by the MRLF is indicated by a comparison of employment figures 
obtained in the Census of Population and in the MRLF with those ob- 
tained through establishment reports by the Census of Agriculture, 
the Bureau of Labor Statistics, the Bureau of Agricultural Economics 
and the Social Security Board. Estimates of employment based on es- 
tablishment reports would always be expected to exceed estimates 
based on population enumeration because individuals working in more 
than one establishment during a reporting period are counted more 
than once. However, the extent to which the employment estimates 
based on establishment reports have exceeded the estimates based on 
population enumeration has been greater than the expected duplication 
in establishment reports. 

Enumeration experience also indicated that an incomplete employ- 
ment count was being obtained. In March, 1942 each person who was 
neither working nor looking for work was asked if he could take a 
full-time job if one became available within 30 days. This inquiry 
so influenced the responses to the regular labor force questions that 
the estimated level of employment increased by almost a million, and 
the number of persons classified as housewives and students decreased 
by a corresponding amount. In the following months when the MRLF 
interviews were conducted as usual this increase in the civilian labor 
force vanished. In November, 1942 a second inquiry was made into the 
available labor reserve with similar results. 

Repeated evidence of this type, even after careful training of enumer- 
ators in handling supplemental questions, suggested that the additional 
questions did not distort the MRLF results, but rather that they par- 
tially corrected for a consistent under-enumeration of employment. 
As a result of these experiences, it was decided to make a more system- 
atic investigation of the nature and extent of the under-enumeration 
of employment which was occurring. 

The evidence at hand suggested that the question, ‘‘Was this per- 
son at work at a private or government job last week?” does not ob- 
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tain a positive response for some employed persons who consider them- 
selves to be not workers but housewives or students. Evidence available 
also indicated that a large proportion of unpaid family workers were 
not counted as employed because the wording on the schedule sug- 
gests paid employment. In addition, the exclusion of incidental chores 
from the definition of unpaid family work operated to exclude some 
persons working substantial amounts of time. 

Various checks indicated that there was little uniformity from 
enumerator to enumerator or from respondent to respondent in what 
is considered to be incidental chores. It appeared that more consistent 
and reliable results would probably be produced by including chores 
as unpaid family work and then eliminating from the count of unpaid 
family workers those working fewer than a specified number of hours. 
(It is necessary to attempt some exclusion of incidental chores from 
unpaid family work; otherwise an employment count among persons 
living on farms or operating family business approaches a count of 
the population of working age.) 

In the fall of 1944, the Census staff began developing and testing a 
revision of the MRLF schedule with a view to achieving a more nearly 
complete count of all employed persons. By January 1945, we had a 
revised schedule and definitions which sought to correct the difficulties 
in the old schedule as follows: 

The first question on the new schedule merely asks what the person’s 
major activity was during the census week. Most people tend to think 
of themselves as engaged primarily in some one activity, working, 
keeping house, going to school, etc., even though they may also be 
engaged in various other pursuits. The enumerator using the new 
schedule accepts the respondent’s statement of his major activity, 
knowing that some persons in the labor force will not report themselves 
as workers at this point in the interview. Then for all persons whose 
major activity or status is indicated as something other than working, 
the enumerator asks whether, in addition to his stated major activity, 
the person being enumerated did any work for pay or profit during 
the census week. It is at this point that the part-time work of the house- 
wife or student is naturally reported. Note that the old schedule at- 
tempted to obtain a complete count of persons at work by asking the 
single question, ‘‘Was this person at work on a private or government 
job last week?” The new schedule, on the other hand, uses two ques- 
tions. The first enables the respondent to give his own classification of 
activity status and the second asks specifically whether or not persons 
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whose major activity is considered to be something other than work- 
ing, did any work during the census week. Many persons were sur- 
prised at the effect that this change in the method of obtaining a count 
of employment had upon the number of persons reported to be work- 
ing. The question on the old schledule appeared to be quite definite 
in its meaning. Actually, as we have seen, it did not mean the same 
thing to different people. 

The new schedule definition of unpaid family work includes inci- 
dental chores. As a substitute for the exclusion of incidental chores in 
the old definition, persons working less than 15 hours per week at un- 
paid family work are excluded in the tabulation process from the count 
of those at work. 

The new schedule was pretested in April, 1945, in all MRLF sample 
areas throughout the country. The pretest was based on a sample of 
approximately 2,000 households selected at random from the total 
MRLF sample. The sample households were enumerated with the 
old schedule as a part of the regular April enumeration. The following 
week the sample households were enumerated a second time, using the 
new schedule. The information recorded on the new schedule applied 
to the same census week as the old schedule. The employment status 
information from the old schedule together with the information from 
the new schedule was transcribed to punch cards and tabulated so 
as to provide a direct comparison of the employment status informa- 
tion obtained by use of the two schedules. The tabulations indicated 
that the new schedule would increase the count of males employed by 
900,000 and of females employed by 1,600,000. The count of unem- 
ployed among males was reduced by a little over 100,000 and was in- 
creased among females by about an equal amount. Over 90% of the 
additional male workers were under 20 years of age, most of them stu- 
dents. Among females, on the other hand, the additional workers were 
distributed among all age groups and most of them were previously 
classified as housewives. Of the additional workers found in non- 
agricultural industries over 50% were in trade and service activities, 
which are the fields of employment in which part-time work by 
housewives and students is most prevalent. While many of the ad- 
ditional workers worked only part time, the entire group could not 
be characterized as part-time workers. Nearly 50% of the additional 
workers worked 35 or more hours during the census week. 

About one in ten of those classified as employed on the basis of the 
new schedule reported some other activity, such as home housework 
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or going to school, as their major activity. The old MRLF schedule 
recorded only about one-half of these persons as working and the 
other half as not working. 

On the basis of the pretest results, it was decided to adopt the new 
schedule beginning in July 1945. In that month the entire sample was 
interviewed first with the old schedule and then with the new schedule 
to obtain a complete check of the difference in results obtained with 
the new as compared with the old schedule. 

In July the differences in response obtained on the new schedule 
as compared with the old were similar to the differences obtained in the 
April pretest. The magnitudes, however, were somewhat different. 
In July the new schedule showed an increase of 1,400,000 in employ- 
ment as compared with 2,500,000 in the April pretest. In large part 
the difference between the April and July results arose from the fact 
that in April school was in session and a large number of employed 
students were misclassified with the old schedule. In July, when 
schools were not in session, this occasion for misclassification did not 
exist. The April pretest also indicated a somewhat larger number of 
misclassified housewives than did the July double enumeration. This 
may have been due to the fact that some enumerators discovered 
during the period of discussion of the new schedule and the April 
pretest that the old schedule was obtaining an under-count of empioy- 
ment and they, therefore, adopted interviewing techniques that in 
part compensated for the under-enumeration of employment obtained 
with the old schedule. To reduce the extent of disruption of the MRLF 
series prior to the adoption of the new schedule, supervisors were 
instructed not to discuss the new schedule nor the reasons for its 
adoption with the enumerators; the April pretest was conducted with 
a small number of enumerators; and those who participated in the 
pretest were instructed not to discuss the results with other enumer- 
ators. 

As we reported earlier, there is evidence indicating that under- 
reporting of employment existed in the 1940 Census of Population as 
well as in the MRLF and in other population enumerations which use 
the same type of questions to obtain a count of employment. The ex- 
tent of under-reporting is much greater in those age and sex groups 
in which a smaller proportion are usually employed. These same age 
and sex groups have contributed most to the wartime expansion 
of the labor force. It seems probable, therefore, that the extent of 
under-reporting obtained with the old schedule was greater in 1944 
and 1945 than in earlier years. 
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We are working on a revision of the MRLF data prior to July, 1945, 
to bring them in line with the measurements obtained with the new 
schedule after July, 1945. The procedure we are following is to relate 
the size of the adjustment to the level of employment for each age and 
sex group. This will result in a much less revision for in the earlier 
months than for the later months. 

The new schedule has also affected the measurement of unemploy- 
ment, but to a much lesser extent. Beginning in 1940 the basis for de- 
termining whether or not a person is unemployed was whether or not 
he was actually seeking a job. Persons who would be actively seeking 
work except for the fact that they were temporarily ill or that they 
believed there was no work available in the community, were also clas- 
sified as unemployed and were referred to as inactive job seekers. The 
old schedule obtained a count of these persons by asking those who 
reported that they were not actively seeking work the question, “Why 
not?” In the months just prior to the adoption of the new schedule, 
when unemployment had been reduced to less than one million, nearly 
50 per cent of the total number of unemployed were included in this 
inactive group. The new schedule does not ask the reason for not look- 
ing for work. Rather it relies on the respondent’s answer to the ques- 
tion as to whether or not the person being enumerated is looking for 
work. If persons indicate in the course of the interview with the new 
schedule that they would be looking for work except that they are 
temporarily ill or believe no work available, they are classified as look- 
ing for work. The number of these inactives picked up with the new 
schedule, however, appears to be considerably less than the number 
counted with the old schedule which contained the question as to why 
persons were not actively seeking work. In addition, some of those 
reported as actively seeking work with the old schedule are found with 
the more specific and complete questions on the new schedule to be at 
work in addition to seeking another job. Such persons are classified 
as employed. Finally, the new schedule obtains a more nearly com- 
plete count of housewives and students who are actively seeking work. 
The net effect of all these factors on the count of unemployment ob- 
tained with the new schedule as compared with the old is small, the 
net change being a reduction of some 150,000 in the level of unemploy- 
ment in July 1945. 

Experience with the new schedule since the end of the war is provid- 
ing some very encouraging evidence as to the sensitiveness of the look- 
ing for work question as a measurement of unemployment. For ex- 
ample, a large proportion of the women released from war plants im- 
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mediately started reporting themselves as not looking for work. Like- 
wise a number of men who decided after the war to take a short rest or 
vacation from the labor force before again looking for a job were re- 
ported with the new schedule as not looking for work. 

Our experience with measurement during the last few years empha- 
sizes the difficulty of obtaining a full count of persons having a given 
characteristic—for example those who did any gainful work during a 
given week. Since such counts will include persons having a consider- 
able range of characteristics, it is important to develop, as a part of 
measurement, a comprehensive and carefully worked out classification 
procedure that will provide the subdivisions of employment and un- 
employment that are needed for various uses and purposes. 

At the same time, there are definite limitations on the refinement 
that can be introduced successfully into labor force classification based 
on a recurring enumeration of a sample of households. What sort of 
criteria have developed out of our experience? 

First: The questions on which classifications are based must not 
arouse antagonism and must be such that they can be asked month 
after month. (In order to provide greater continuity in the data, MRLF 
households are enumerated monthly for about 6 months.) One of the 
reasons for the difficulty with the old schedule was that the enumerator 
was supposed to ask of all persons not working or looking for work, 
their reasons for not looking for work. Because this proved to be awk- 
ward in many cases, some enumerators probably did not ask the ques- 
tion, but classified the respondent by observation. 

For the same reasons, the classification should not depend on a long 
series of questions which will discourage continued cooperation. Enu- 
merators are only human and are likely to adopt short-cuts in order to 
avoid serious opposition, particularly when they have to return to the 
same household the next month. 

Second: The questions must be objective and designed to provide 
approximately the same answers no matter who asks or answers 
them. In the great majority of cases, the information is furnished 
by the housewife. She can give fairly adequate replies to questions on 
whether the members of her family are working or looking for work, 
the approximate amount of time they worked and the type of industry 
in which they worked. She cannot be expected to give equally reliable 
answers to questions of the intentions, preferences, or attitudes of other 
members of the household. For example the question, “How many hours a 
week does your daughter want to work?” might be answered quite 
differently if asked of the daughter herself. 
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Third: The questions must not strain too much the respondent’s 
memory. This criterion applies to the recency of the date or period of 
time to which the information requested applies, the length of the 
period of time to which the question applies, and the detail in which 
the information is requested. 

Fourth: Concepts, definitions, and instructions must be simple and 
conform as nearly as possible to common usage. For example, we ob- 
tain a measurement of unemployment by simply asking whether a 
person is looking for work rather than by specifically asking whether 
or not a person was engaged in one or more of such activities as plac- 
ing or answering ads, writing letters, applying at a factory personnel 
office, registering with the USES, etc. Of course, in instructions to 
enumerators it is necessary to indicate the criteria to be used if the 
respondent is not certain whether or not some member of the household 
is looking for work. 

In setting up a labor force classification we must take into account 
not only what kind of questions will yield reliable and useful answers 
but also, in the case of sample surveys, what types of estimates can 
be provided by a sample of a given size. 

Our fifth criterion, then, is that the classification should be suffi- 
ciently broad so that the month to month changes in estimates are sig- 
nificant and can be distinguished from sampling variation. For ex- 
ample, it has been proposed that unemployed married women whose 
husbands are earning $25 a week or more constitute an important 
group to break out from the total number looking for work, on the 
theory that their unemployment is not so serious as an indication of 
failure to achieve full employment. Even if the premise were agreed 
to, it would be a mistake to adopt such a classification if the sample 
cannot provide reliable estimates of the size of this group from month 
to month. Unwarranted conclusions would be drawn from changes 
which are only the result of sampling variation. 

Again, differentiation is possible at certain times but not at others. 
The breakdown of the unemployed into the groups suggested in the 
previous paper might be proper when the level of unemployment is 
high but not when the level of unemployment is low. The groups in 
the population which it would seem desirable to differentiate because 
they do not have the characteristics generally associated with the 
majority tend to be smali in size and therefore impossible to estimate 
adequately with a small sample. 

These cautionary words do not mean that a wide variety of differ- 
entiation is not possible if obtained occasionally and on the basis of 




















312 AMERICAN STATISTICAL ASSOCIATION 


special inquiries. Questions which would end all types of cooperation 
if asked each month can be asked once during the period in which a 
household is in the sample. Occasionally it would be possible to pay 
the cost of interviewing each adult in a household, so that he can re- 
port directly his present attachment to the labor force, his earnings if 
employed, and the type of work he is looking for if unemployed. Finally, 
the characteristics of very small segments of the population can be 
estimated from time to time, and although the estimates will be sub- 
ject to a large sampling error, they will still be useful as an indication 
of the approximate size of the group, in relation to other groups and 
to the total. 

The types of differentiation proposed in the preceding paper are, for 
the most part, consistent with the criteria proposed. Geographical dif- 
ferentiation and that related to such factors as age, sex, and marital 
status present no enumeration difficulties. The size of the present 
sample, however, will yield only national estimates. Funds are being 
requested to enable estimates to be made on a quarterly basis for 50 
metropolitan areas and for the larger States. The expanded sample 
would also make it possible to provide much more extensive and reliable 
breakdowns of the national data. 
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EMPLOYMENT STATISTICS IN THE PLANNING OF A 
FULL-EMPLOYMENT PROGRAM 


CHARLES STEWART 
Bureau of Labor Statistics 


AND 


Lorine Woop 
Bureau of the Budget 


The term “full-employment program” usually means a 
program designed to maintain the general demand for labor. 
Unemployment can be reduced by such a program only to a 
certain point; beyond that point an increase in the general 
demand for labor will be relatively ineffective and will have 
undesirable repercussions. In setting a goal for a full-employ- 
ment program we are essentially expressing a judgment on 
how far we can safely go in reducing unemployment by in- 
creasing general demand. In translating this judgment into 
figures, the best guide is our historical experience. 

The essence of a full-employment program is the coor- 
dination of a multitude of governmental activities to achieve 
a quantitative result. This requires that the goal, initially ex- 
pressed in employment terms, be translated into a compre- 
hensive national budget. The construction of such a budget 
requires essentially the identification and projection of causally 
significant relationships. For this purpose, the primary sta- 
tistical need is that economic data in various fields be made 
more comparable and consistent. 

Objectives expressed in terms of national aggregates of 
employment, unemployment and expenditure are appropriate 
in the long-range planning of economic policy, but alone will 
not be sufficient for current appraisal of the adequacy of the 
measures taken. For this purpose we cannot rely on any single 
set of data. We must rather undertake a continuing analysis 
of all information which may throw light on the adequacy 
of our employment objectives and the meaning of current 
levels of employment and unemployment in the context of 
the developing situation. 


ECENT discussion of full-employment policy places upon statis- 
ticians and economists a formidable task of developing the types 
of statistical analysis needed in its planning and execution. The task is 
not so simple as the discussion might seem to imply. The preceding 
papers have dealt with problems and recent progress in refining some 
of the statistical data which will be needed for such purposes. This 
paper deals more directly with some of the problems arising in connec- 
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tion with the use of labor force and employment statistics in the plan- 
ning and administration of a full-employment program. 

Three types of problems will be considered. First, there are problems 
of defining quantitatively the objective of full-employment policy, in 
terms of employment and unemployment. Second, there are problems 
centered around the development of the national budget as the central 
tool in economic policy-formation. Finally, there are problems con- 
nected with the use of statistical data in determining currently how 
closely the full-employment goal has been approached. 


I 


A common approach to the definition of full employment makes use 
of the relationship between the number of persons unemployed and the 
number of unfilled job openings. Beveridge, for example, suggests that 
full employment means “more vacant jobs than unemployed men.”! 
To anyone concerned with problems of measurement, this type of defi- 
nition raises serious questions. It is difficult enough to determine when 
a given individual is to be classified as unemployed, but there is at 
least no question that he is an individual, and, once his proper classifica- 
tion has been determined, he can be counted. A “job opening” however, 
is something much more tenuous; it may represent only the desire of 
an employer to interview prospective workers in order to be able to hire 
promptly at some future time in case the need should materialize. 

We may get further if we ask why it is necessary, in setting up a 
“full employment” objective, to allow for any unemployment. It is 
doubtless true that unemployment of two million will involve some- 
what less hardship to the individuals involved than will unemployment 
of 15 million. But this is not in itself a justification for taking two 
million unemployed, rather than one million or 100,000 as our “full 
employment” goal. The justification must be sought in the nature of 
the program for which the goal is being set. 

The term “full-employment program”, in the context of most current 
discussion, has a somewhat more specific connotation than the words 
themselves seem to imply. It ordinarily means a program designed 
primarily to maintain the general demand for labor, usually by main- 
taining the demand for goods and services. While some consideration 
may be given to demand for labor in specific regions and industries, 
emphasis is generally on the national picture. Measures designed spe- 
cifically to improve the organization of the labor market—improvement 
of employment services, elimination of seasonal variations, or reor- 


1 Sir William H. Beveridge, Full Employment in a Free Society, New York, W. W. Norton and Co., 
1945, p. 19 
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ganization of casual industries—are usually thought of in a separate 
category. 

Increased demand for labor will undoubtedly result in some reduc- 
tion in unemployment under any conditions. But if unemployment is 
already low, the effect will be relatively small, and serious inflationary 
pressure may result. 

In setting a “goal” for a full-employment program, we must take 
account of the limitations of the program. The allowance for “mini- 
mum” or “frictional” unemployment which we make will vary, of 
course, depending on the character of the program itself and, in par- 
ticular, on the extent to which it includes measures designed to im- 
prove the organization of the labor market. At best, however, the 
figure will represent a judgment on how far we can safely go in reduc- 
ing unemployment through increasing demand. 

In translating this judgement into figures, the best guide is probably 
our historical experience. At present, the only period of relatively full 
employment for which data are available is the war period, which is 
not an adequate guide to peacetime conditions. It is relevant, never- 
theless, to consider that unemployment, as measured by the Census 
series, was below one million during most of the period from mid-1943 
through mid-1945. We should probably not choose to face in peace- 
time the problems of labor shortage which characterized this period, 
and to invoke the manpower and price controls which they necessitated. 
The earlier part of the war period, before labor shortages became acute, 
may provide a better guide. 

It is important to recognize that any projected unemployment or 
labor force figure has meaning only with reference to a specific set of 
definitions and techniques of measurement. The preceding papers have 
suggested the extent to which relatively minor changes in procedures of 
enumeration may affect the resulting estimates. 


II 


An objective defined only in terms of unemployment is hardly ade- 
quate as a guide to the development of economic policy. We need also 
to know what the goal implies in terms of employment, production, and 
national income. It may be necessary, indeed, to go much beyond this 
to the preparation of detailed estimates of income, consumption, sav- 
ings, and investment, and perhaps even estimates of production and 
employment for individual industries. 

It may not be obvious why such estimates are relevant to a program 
primarily concerned with employment. But the term “full employ- 
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ment” refers to the objective of the program, and not to its scope. What 
is implied by the phrase “full-employment program” is a coordination 
of Federal economic and fiscal policy in the interest of maintaining 
employment. 

The essential characteristic of the program is thus the coordination 
of Federal activities in a multiplicity of fields which impinge on the 
operation of the economy. It involves, moreover, their coordination to 
achieve a quantitative result—a given level of demand for labor neither 
too high nor too low. The planning and execution of such a program will 
clearly require the preparation of comprehensive estimates covering 
the whole range of economic activity. The term “National Budget,” 
frequently used to describe such estimates, is a fairly exact metaphor. 

A full-employment budget starts with a forecast of the labor force, 
based on the latest current data and allowances for anticipated future 
developments. After deducting whatever unemployment figure is felt 
to be consistent with the full-employment objective, this yields a goal 
expressed in terms of total employment. From total employment we 
pass to estimates of production and income; and thence to an estimate 
of consumption, investment and total expenditures. The comparison 
between total expenditures and total production provides a test of the 
adequacy of demand to sustain full employment, and the underlying 
estimates provide, in greater or less detail, a description of the asso- 
ciated pattern of economic activity. 

The problems involved in the construction of such a budget are in 
essence problems of identifying and projecting significant economic 
relationships. The task is complicated by the fact that most of the data 
which must be used were not designed for such purposes, and in many 
cases cannot be easily adapted to it. Two examples, both involving 
labor force and employment statistics, will serve to illustrate the sort of 
statistical problems which are encountered. 

In passing from the estimate of total employment to estimates of 
production and income, we must make use of the projections of hours, 
man-hour productivity, and hourly or weekly earnings. Here we 
encounter the difficulty that our employment data are of two funda- 
mentally different sorts, based on different concepts and definitions of 
employment. Data on labor force and unemployment are necessarily 
based on reports from the workers themselves (or members of their 
households) on the basis of which each individual is classified on the 
basis of his employment status. But such data do not lend themselves 
to the study of changes in productivity, hours, and earnings. For this 
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purpose, it is necessary to make use of reports from employers which 
provide data on the number of persons on their pay rolls, their aggre- 
gate earnings, and the aggregate number of man-hours worked. Such 
reports necessarily require the use of a somewhat different definition of 
employment, and the estimates of total employment derived from these 
reports may differ materially from those based on a household survey. 

Lack of comparability between different series of data is not always 
traceable to such fundamental differences in techniques of measure- 
ment, however. The statistical work of the Federal Government was 
not designed to serve the needs of national economic policy-making. 
Rather it has developed gradually in response to much more particu- 
larized needs and interests. As a result, estimates of different but closely 
related things, prepared by different agencies for different purposes, 
may not be consistent with respect to scope, definitions, and classifica- 
tion even when the basic data are drawn from the same or similar 
sources. 

Current data on hours and earnings in nonagricultural industries are 
available only from BLS reports, which also provide data on employ- 
ment and aggregate wages. Trends in productivity, hours, and earnings 
must therefore be studied largely on the basis of these data. But to use 
the results of such studies in the construction of national budgets, they 
must be fitted into the structure of the Commerce Department esti- 
mates of national income. The estimates of the wage and salary com- 
ponent of national income, though based largely on the same sources 
as the BLS estimates of employment, are not directly comparable 
with the BLS estimates with respect to classification, or even with re- 
spect to total level and trend. Each agency has used the classifications 
and methods of estimation and adjustment it felt best adapted to its 
own needs, using the best data available at the time. This particular 
case is cited not because it implies criticism of either of the agencies 
involved; they have recognized the problem and are now seeking to 
achieve consistency. But it is illustrative of the problems involved in 
adapting to the needs of national economic policy-making a series of 
statistical programs that have developed largely in response to different 
needs. 


III 


The long-range planning of a full employment program may be car- 
ried on with primary reference to objectives expressed in terms of gross 
national product or expenditures. But as the expenditure goal is ap- 
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proached, it will be necessary to determine whether in fact the accom- 
panying employment represents full employment, under-employment, 
or over employment. Presumably the objective would also be stated in 
terms of employment. But that goal likewise is tentative in character, 
and requires periodic checking to determine whether it actually repre- 
sents, at any given time, the most accurate measure of the full-employ- 
ment objective. 

As and if we approach the stated goal, how do we know if the em- 
ployment goal is actually consistent with the ultimate objective—the 
employment of the Nation’s labor force? Until 1940 only decennial labor 
force data were available, which afforded scanty material for determin- 
ing labor force norms. Estimating normal labor market participation 
rates and projecting actual labor force aggregates is still a virgin field 
of research. It is obvious, particularly at the present time, that fore- 
casts of the probable size of the labor force for even one year in advance 
cannot be precise. 

If we should fail by a wide margin to reach full employment, the 
facts of the situation would probably be clear, and more refined statis- 
tical analysis would hardly be necessary. But if we should approach 
full employment more closely, a more careful analysis would be re- 
quired, and under some circumstances a reappraisal of the goal would 
be necessary. If, for example, the actual labor force, as shown by the 
Census Bureau’s Monthly Report on the Labor Force, should prove to 
be substantially below the estimated labor force as used in determining 
the employment goal, downward revision would be required. On the 
other hand, if the estimates of required employment should prove to 
be too conservative, the continued use of too low a figure for the full 
employment objective would understate the real employment needs of 
the Nation and could result in a failure to take necessary action to 
utilize our real labor resources. 

At this point we are primarily concerned with the problem of de- 
termining how we know whether or not we have full employment when 
we are at high levels of economic activity. We have to “get behind” the 
statistics and appraise their meaning in terms of the general labor 
market and economic situation. We need to provide for the continuous 
analysis of ali relevant information which may throw light on the 
adequacy of the employment objective and the meaning of the prevail- 
ing levels of employment and unemployment in the context of the de- 
veloping situations. More specifically, we need to examine periodically 
the reasonableness of the allowance made for frictional unemployment, 
the current level of the labor force as against projected levels, and 
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questions centering around the degree of under-employment of em- 
ployed workers. 

The volume of unemployment shown by the Census Bureau’s 
Monthly Report on the Labor Force does not provide us with a con- 
clusive answer as to whether the current employment level is actually 
approximating full employment when unemployment is relatively low. 
The primary reason for this ambiguity is the problem of appraising the 
volume of unavoidable frictional unemployment in any given situation. 
If this were readily determinable there would be strong reasons for 
adopting unemployment as the key measure—i.e., for using a stated 
minimum volume of unemployment as the objective. Probably the 
best clues as to whether unemployment at any given time is largely 
frictional can be found in evidences of labor shortages in some parts of 
the economy or in pressures toward rising prices for the factors of pro- 
duction. Hence over-all appraisal of the general economic situation must 
be relied upon to determine if the volume of unemployment reported 
at any given time is essentially frictional in character. It is apparent 
that the frictional component of unemployment cannot be readily iso- 
lated and may vary widely from time to time. Experience may prove 
that any rough allowance made in advance—for purposes of project- 
ing an employment goal—may be so far off as to result in a serious dis- 
tortion of the employment objective. 

A second type of problem emerges from the possibility of over-state- 
ment of the size of the labor force as estimated currently on a house- 
hold enumeration basis. The Census estimate of the number of persons 
seeking work but without jobs has the virtue of reflecting the re- 
spondents’ own declarations that they are seeking work—a virtue 
which may give rise to the criticism that it is too subjective in character. 

The general alternative to revising the employment goal to take 
account of the actual changes in the size of the labor force is to base 
the employment goal on long-run historical trends as to the proportions 
of the population of working age in the labor force. This may mean, in 
effect, that we hold to an arbitrary employment goal which has at best 
some valid relationship to labor market participation rates in 1940, and 
ignores marked changes which may have occurred since that time. 

Any dangers inherent in the possibility of an overstatement of the 
numbers of persons reported currently as seeking work may be guarded 
against by continuous analysis of a wide variety of labor market in- 
formation. An analysis of the age-sex composition of the labor force as 
reported in the Monthly Report on the Labor Force, for example, will 
reveal whether the number of youths, women, and older workers show 
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any unusual developments. It is possible, further, to examine the geo- 
graphic and industrial composition of unemployment. The Monthly 
Report on the Labor Force does not provide the basis for such an 
analysis because it is not designed, at present, to show the geographic 
or industrial distribution of unemployment. But this can be ap- 
proached by analysis of unemployment compensation claims data and 
analysis of changes in employment, by industry, by state, and by area, 
as shown by the Bureau of Labor Statistics reports, as well as by Em- 
ployment Service data on job applicants and referrals. 

The unemployment compensation data refer, of course, only to 
covered employment, but would reveal geographic concentrations as 
well as affording proof of availability and employability. Analysis of 
the BLS employment data, while not conclusive, would suggest whether 
excessive disemployment had occurred in any particular segment even 
if unemployment nationally was reasonably low. Employment Service 
data would reveal the age, sex, experience and other characteristics of 
job applicants as well as changes in job opportunities by area and 
industry. 

Data of these kinds may throw little light on the question whether the 
prevailing volume of unemployment results from lack of adequate 
over-all demand for labor or from frictional causes. To the extent that 
the data point to regional or industrial concentration of unemploy- 
ment, however, this would suggest that the unemployment arises from 
immobility of labor or other factors of production and from imperfect 
organization of the labor market. From the point of view of remedial 
action, this would indicate special measures rather than broad fiscal or 
other measures directed toward increasing aggregate demand. But in 
any case, such review and analysis of all available labor market in- 
formation would provide a check on the validity of the current esti- 
mates of the size of the labor force as shown by the Monthly Report on 
the Labor Force. 

A third problem, somewhat different in character, is the existence of 
conditions of under-employment—e.g., employment in marginal ac- 
tivities or at a workweek shorter than desired—as a result of lack of 
real full employment opportunities. With employment at relatively 
high levels, it may prove difficult to demonstrate the facts satisfactorily 
enough to justify remedial action. Information as to weekly hours of 
work are available from the Bureau of Labor Statistics monthly re- 
ports for many industries and from the Monthly Report on the Labor 
Force for a cross section of the population. Many individuals desire 
part-time employment, of course, and even where a marked increase 
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in such employment can be shown, the meaning is not altogether clear 
except in periods of recessioa or depression. Further research and in- 
formation would be required, as indicated in a previous paper, for 
adequate evaluation of the existence of under-employment under condi- 
tions of relatively high employment. The problem is significant, how- 
ever, because the existence of under-employment presents evidence as 
to lack of over-all demand for labor as a factor impeding the most 
efficient utilization of the Nation’s labor resources. 





MEASURING AND FORECASTING CONSUMPTION* 


FRANK R. GARFIELD 


During the past decade, under the stimulus of changed 
economic conditions and changed attitudes toward policy 
formation, real progress has been made in the measurement of 
consumption in the United States. Concepts have been clari- 
fied considerably, techniques for gathering information have 
been improved, some information has been collected, and a 
certain amount of lay as well as professional support has been 
developed for further use of public funds to measure consump- 
tion. Work in this field, however, is still in a pioneering stage, 
particularly with reference to the measurement of changes in 
the physical volume of consumption. 

One purpose for which information on consumption has 
been used increasingly during recent years is the forecasting 
of changes in consumption and in economic conditions gen- 
erally. Consumption has been widely regarded as a variable 
dependent in quite regular fashion on certain other variables, 
particularly consumer income after taxes and increments 
therein. This interpretation has facilitated calculations of 
hypothetical figures for future gross national product and for 
future employment; and has contributed also to conclusions 
reached about prospective aggregate demand and aggregate 
supply, and prospects for inflation. 

In the author’s view, generalizations about the relationships 
between consumption and other elements in the economic 
situation have been made without adequate recognition of 
differences in behavior resulting from differences in time, place, 
and circumstance. This accounts in part for the shortcomings 
of many forecasts made last summer. Five suggestions con- 
cerning forecasting are offered for consideration: (1) that 
consumption in any particular period be estimated after close 
study of the prospects for component parts, with income 
after taxes being regarded as only one (very important) factor 
influencing consumption; (2) that more attention be given to 
accurate measurement of the current level of the physical 
volume of consumption as a starting point for estimating 
future changes in consumption; (3) that the common assump- 
tions about constant prices be modified to suit the occasion 
and that the whole approach to the problem of estimating the 
probable course of consumption, income, employment, and 
prices be re-examined and revised to take adequate account 
of all the important factors in the market; (4) that very real 
limits to what can be accomplished in forecasting by describ- 
ing the economic world in terms of mathematical relationships 


* Paper delivered before the American Statistical Association, Cleveland, Ohio, January 25, 1946 
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be recognized and that the importance of selective judgments 
with respect to particular periods be emphasized; and (5) 
that, in view of the present uncertain state of the art of fore- 
casting, care should be taken in formulating policy to rely only 
so far as is necessary on forecasts arrived at by any method. 


URING the past decade considerable pioneering work has been done 
D in this country in the measurement and forecasting of consump- 
tion, and the bibliography of this field is now quite extended—much 
more so than I supposed when I accepted this assignment. So far, 
fortunately, crystallization on any single viewpoint has been avoided 
for the most part, both in measurement and forecasting, and develop- 
ment of new ideas as to objectives and methods has been spirited. 
About all I hope to do this morning is to remind you of this and to sug- 
gest some of the broader issues which seem to merit further attention, 
particularly issues relating to forecasting. 

Measuring consumption. Great impetus was given to measurement in 
the field of consumption by the marked deterioration in living condi- 
tions which occurred in the early 1930’s and by the widespread modi- 
fication in that period of the laissez-faire views of earlier, more prosper- 
ous days. Costs of investigation were regarded in many quarters as no 
impediment—in fact, surveys of all sorts were encouraged as a means of 
further unbalancing the Federal budget and providing employment. 
There was keen interest in what could be done by the Government to 
improve living conditions, and especially the lot of the lowest one-third. 
The Consumer Purchase Study of 1935-36 provided information on a 
scale never before attempted in this field, including detailed analysis 
of the consumption of 60,000 families classified by income group, degree 
of urbanization, and other characteristics. In some particulars, such 
as the coverage of high income groups, this study was not wholly 
satisfactory, but it marked a real turning point in the whole history of 
the measurement of consumption. Information was obtained not only 
on dollar expenditures but also on what consumers were able to obtain 
for their money in real or physical terms. 

Since this initial comprehensive survey there have been further 
studies of a similar sort in 1941 and again in 1944, but the samples have 
been extremely small and, even though sampling techniques had im- 
proved meanwhile, the results have been of only limited value. The war, 
it may be noted, called for action to restrict consumption rather than 
to expand it and to conserve labor rather than to provide employment. 
The gathering of information not directly essential to the war program 
was generally curtailed. 
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Meanwhile, since the early 1930’s analysis of current economic de- 
velopments has been extended in new directions. The focus of many 
recent studies has been development of an integrated system of broad 
aggregates covering all activities in terms of income flows and expendi- 
tures, whereas in the 1920’s the effort was to develop measures of 
changes in strategic points in the economy and not to formalize esti- 
mates of changes elsewhere. Development of this new system has led 
to considerable work in estimating annual consumer expenditures for 
consumption items and estimates have been made largely on the basis 
of biennial Census data for production and less frequent Census data 
on distribution. Work in this field has provided such expenditure esti- 
mates over a considerable period of time in quite some detail by com- 
modity groups but not by type of purchaser. The concepts and pro- 
cedures were determined with a view to fitting the results into the 
general analysis of gross product, and the matter of physical volume of 
consumption was not a primary consideration. 

For purposes of current analysis it appeared necessary to have data 
for much shorter periods than years and the Department of Commerce 
has developed a series on the basis of much more limited data than those 
available for annual compilations. These figures have certain limitations 
as a measure of current monthly changes in consumer expenditures and 
can be interpreted as representing changes in the physical volume of 
consumption only through a price deflation process. The subject of 
price deflation, especially for the war period, is still one of sharp debate, 
and differences in point of view as to proper deflation have contributed 
to quite different views as to the current level of consumption and the 
prospect for changes therein. 

Changes in living conditions—the physical volume of consumption— 
received some attention in the planning of the war program and more 
recently have received important consideration in planning the settle- 
ment of war claims and the floating of loans by this country. In this 
connection a “Special Combined Committee” set up by the Combined 
Production and Resources Board made a careful study of existing in- 
formation concerning changes in living conditions during the war 
period in the United States, the United Kingdom, and Canada and did 
what they could to make comparisons of absolute levels of consumption 
in the three countries. The chairman of the group making this study 
is the chairman of this meeting and any questions on this may well be 
addressed to him. It may be noted, however, that this study was an 
attempt to get at changes in living conditions and not simply changes 
in consumer expenditures; and that it was an attempt to do this by 
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building up physical volume estimates directly as well as by deflating 
expenditure series. It was, moreover, an attempt to make international 
comparisons meaningful through appropriate reclassification of data 
available in three different countries and through discreet handling of 
price deflation problems of peculiar difficulty. 

Finally, in order to estimate the importance of increased consumer 
holdings of liquid assets in affecting postwar spending, opinion surveys 
have been undertaken in an experimental way. It remains to be seen 
whether intentions to buy can be taken as seriously as intentions to 
plant. 

These observations as to measurement of consumption have been 
made to indicate some of the many purposes for which information has 
been sought, some of the many types of information obtained, and some 
of the broader limitations of available data for purposes of historical 
analysis and of forecasting. In passing, perhaps apology should be 
made to certain people for talking as if no one thought of consumption 
before the Great Depression—particularly to such as Mr. Engel, the 
1919 budget study people, the people who initiated the collection of 
monthly department store sales data in 1919, and the authors of 1,500 
family expenditure studies made here and abroad before the Consumer 
Purchase Study of 1935-36. 

Forecasting consumption. At this point the discussion narrows, in one 
sense, because the forecasting of consumption as a part of the total 
economic situation is only one of the many purposes for which informa- 
tion about consumption is useful. Nevertheless, the issues involved in 
forecasting are themselves pretty broad and forecasts at times are im- 
portant in determining policies, or at least in rationalizing them; the 
time when forecasts were recognized as of significance only by those 
who hoped to turn a quick penny in capital markets is long since past. 
The period since the end of the war has been one in which widespread 
forecasts of quick deflation may have hastened the process of decontrol 
and of tax reduction. Forecasters who made such predictions may well 
ponder whether the influence of their forecasts has been in the right 
direction and whether there is anything they can do to improve their 
forecasting methods for the future. They are, as indicated by these 
sessions, quite aware of the problem. One particular question is how 
satisfactory estimates of consumption have been and how they might 
be put on a better basis. In approaching this question it may be recalled 
that until very recently no attempt had been made to estimate total 
consumption or to forecast the course of consumption, except in the 
most general terms; consequently it would be strange indeed if the 
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results achieved or the methods developed were as good as might be 
hoped for on the basis of more experience. 

One of the most basic facts about the course of consumption in a 
country developed beyond the stage of recurrent famines is the rela- 
tively stable nature of total consumption, in real terms, during most 
peacetime periods. This is characteristic of total consumption in the 
sense of consumer purchases and somewhat more so of total consump- 
tion in the sense of consumer use of goods, including durable goods they 
have on hand. (The usual figures, incidentally, relate to a combination 
of the two; purchases of new automobiles are taken as the measure of 
automobile consumption while rent payments and certain costs of 
home ownership, rather than outlays for new houses, are taken as the 
measure of services rendered by houses.) Such fluctuations as have 
occurred in consumption—or at least in dollar consumption expendi- 
tures—in most past periods have been closely related to changes 
in income, though considerably less marked. 

Observation of such facts about consumption in recent years has led 
many analysts to the conclusion that in full employment models—a 
rather recent invention—or in forecasts of actual conditions, figures for 
consumption could be derived from figures for consumer income after 
personal taxes, with minor allowances for the influence of factors other 
than changes in disposable income. Because of the stability of consump- 
tion, and the closeness of records for almost every year to one or 
another indicated relationship between income and consumption, it has 
been thought that any errors in forecasting consumption would be 
small, percentagewise. The great size of the consumption component of 
total expenditures and the great effect of small percentage errors at this 
point have not received much attention; consumer expenditures in the 
fourth quarter of 1945 were at an annual rate of 107 billion as compared 
with 180 for gross product, and ordinarily are a larger part of the total. 
Nor has adequate attention been given to the impossibility of avoiding 
the consequences in the consumption estimates of errors in estimates of 
other components. For “normal” times the analysis of changes in con- 
sumption runs in terms of changes in derived consumer demand, with 
adequate productive capacity assumed. 

In compiling forecasts of consumption and production for the im- 
mediate postwar period, it has been generally recognized that output 
of automobiles and other durable goods for consumers would be re- 
stricted by technical reconversion problems and that consequently con- 
sumer demand could not be regarded to the usual degree as a determin- 
ing factor in the volume of production in this field. It has also been 
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recognized that demand for durable consumers goods would be to some 
extent independent of current income, owing to depletion of consumer 
stocks and accumulation of consumer buying power. 

Production of nondurable goods, however, was expected to be de- 
termined largely by demand factors and consumer demand in this field 
was thought of as being closely related to income. The limitation of 
production volume through supply factors in a wide range of producer’s 
and consumer’s durable goods, moreover, was expected to limit income 
payments and thereby demand for nondurable consumer goods. 

On the basis of these considerations and allowance for a rapid decline 
in Government expenditures, a reduction of several billion dollars in 
the annual rate of consumer purchases of nondurable goods from the 
middle of 1945 to the end of the year was forecast by some last summer. 
As events turned out, consumer takings of nondurable goods were at 
an annual, seasonally adjusted rate of 66 billion dollars in the fourth 
quarter (a figure later revised to 69.5 billion) as compared with 59 
billion in the second quarter of the year. The rise occurred at a time 
when income payments and disposable income were somewhat re- 
duced. Thus, the usual relationship between consumer income and 
expenditures for nondurable goods did not hold in this period and, 
moreover, the decline in income was very much less than had been 
expected. 

Now we might brush aside this failure to forecast even the approxi- 
mate level of consumption outlays with the observation that this was 
an especially difficult period to forecast by any method. Or, we might 
interpret the continued strong consumer demand for nondurables as 
being so temporary that earlier forecasts would be fulfilled—albeit at a 
somewhat later date than originally suggested. Or, we might take fur- 
ther counsel about the forecasting procedure itself, recognizing that the 
period immediately ahead now will also be unusual and perhaps of 
exceptional importance; and that almost no period is of an average sort 
except some period in the fairly distant future which can not be ex- 
amined very closely in advance. 

My first suggestion is that full recogn?tion should be given—as it has 
not been heretofore—to the many influences that bear on changes in 
consumption over time and especially to the varied importance of these 
different influences in different periods. To regard consumer expendi- 
tures, or even consumer expenditures for nondurables, simply as a 
variable dependent on disposable income is too restrictive, no matter 

what the formula. It is true that in peacetime many fluctuations in 
economic activity stem from business decisions about the purchase of 
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inventories, new plant and equipment, and the like; and that such 
fluctuations affect consumer incomes and expenditures. It is also true 
that considerations other than the consumer market may be paramount 
in such decisions. Almost always, however, these business decisions are 
made partly with a view to the consumer market. Moreover, changes 
occur in consumer stocks of automobiles and other durable goods—and 
also of semidurable goods—which may have important repercussions 
in the markets for these products, especially if they have not been ade- 
quately foreseen. After the end of the war such influences were of great 
importance and there was also a rapid increase in the civilian popula- 
tion, with the return of several million persons from the armed forces. 
Returning veterans, moreover, had special demands, chiefly for men’s 
clothing and the like. Consumers generally were in a stronger financial 
position than would be usual at existing income levels, because of war- 
time savings which had been only partially devalued by advances in the 
cost of living. Clearly, in forecasting consumption in such a period, 
demand for semi-durable as well as durable consumer goods should be 
regarded as at least semi-autonomous; and demand even for perishables 
should be estimated partly on the basis of factors other than current 
income. In 1946 the strength of the consumer’s financial position and 
of his desires and the paucity of his stocks of goods may continue to 
have a real influence on buying, keeping consumer expenditures for 
many “nondurables,” for example, “out of line” with past relationships 
and making the problem of price control more serious than many expect 
it to be. For later, more “normal” years, the point may be less pertinent 
but, again, the “normal” year, or quarter, may be elusive. 

Once the position is accepted that consumer expenditures are autono- 
mous to a significant degree, the estimates of the gross product must 
be made partly on the basis of independent consumption estimates; 
and, further, the estimates for capital formation may need to take these 
consumption estimates into account—unless, perchance, as in the most 
recent period, they have been based primarily on the estimates of prac- 
tical capacity rather than of demand, which has been expected to be 
far in excess of supplies in the field of capital formation. The degree to 
which consumption expenditures should be regarded not only as 
autonomous but also as determining certain other items needs to be 
considered in appraising the particular situation developing at any 
particular time. 

The second suggestion I have to offer is that more attention should 
be given to the measurement of past and current consumption, especial- 
ly in real or physical volume terms. Debate about the extent to which 
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consumer habits may be expected to change under given circumstances 
could be raised to a considerably higher level if more were known about 
the actual course of consumption. Last summer, for example, it would 
have been very pertinent to know how consumption in real terms stood 
in relation to 1939 and 1941. Had we, as many forecasters thought, not 
only avoided a “bed-rock” level of consumption but actually increased 
aggregate civilian consumption by 20 per cent? (And per capita some- 
what more?) Or was consumption little, if any, above the 1939 level, 
as some others thought? Or was it in a range of 7 to 12 per cent (10 to 15 
per cent per capita) above 1939 as the Combined Resources Board 
group concluded about that time with reference to the year 1944? 
It seems evident that the level was lower in mid-1945 than in 1944. 
If this is allowed for and if certain debatable downward adjustments are 
made in the Combined Resources Board figures for 1944, the mid-1945 
level for aggregate civilian consumption would appear to be little above 
the 1939 level and below the 1941 level—with very marked differences, 
of course, for particular groups. Whatever the correct answer may be 
as to the level last summer, it seems clear that, if possible, in the future 
enough work should be done to establish the facts and to avoid such 
sharp divergencies of view as to the level of consumption prevailing at 
the beginning of any particular period. And the information at hand 
should be reasonably accurate by groups of consumption items as well 
as for the total if the significance of any level for the total is to be 
properly appraised. This is a matter, it may be added, of considerable 
importance in estimating the comparative courses of production and 
consumption in the period immediately ahead and therefore in estimat- 
ing when the present expansion period will be terminated. 

The third suggestion I offer for consideration is that in making fore- 
casts of consumption and the economic situation generally, more 
thought should be given to changes in prices, costs, and the bargaining 
process generally. In the predictions framed in gross product analysis 
terms, note is usually made of the fact that the figures shown for various 
types of expenditures are in terms of constant 1944 or 1945 prices as 
the case may be. Note may also be made of the fact that a stable price 
situation is assumed in making the estimates, at least tentatively until 
it can be seen how large aggregate demand may be relative to aggregate 
supply and therefore what the general price situation may be. But right 
now one of the chief questions which needs to be considered in the 
preparation of estimates of consumption expenditures and capital for- 
mation is what prices will do or will be expected to do. And these ques- 
tions, in my view, can not be answered by a global comparison of 
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supply and demand. Prices are made in many markets and price 
changes are often cumulative. They are made in real estate and other 
transactions where only transfer payments are included—as well as in 
markets for currently produced goods to which the gross product table 
refers. They may be so high as to alter considerably the actual course 
of activity—they were in 1919-20—and are practically certain to 
affect the level of income payments, as, for example, when wage rates 
are increased. At the moment production is interrupted while bargain- 
ing over terms of employment goes on. When the various influences of 
changes in prices and costs and the negotiation thereof are considered, 
the estimating process for consumption expenditures and other items 
may be more difficult but should be in some respects more realistic. 

It even appears that in some periods the most significant develop- 
ments may be easier to see when the whole approach is altered and 
instead of asking how much employment or unemployment there will 
be, the question asked is what will happen to prices. I hasten to dis- 
claim any desire to play down the importance of employment or 
activity; rather, I would like to see what additional can be learned from 
another approach. As one illustration of what I have in mind, the in- 
terpretation of recent inventory figures may be mentioned. In the gross 
product table, inventory changes shown are small, of the order of three 
billion dollars annually, and their significance is not increased much by 
allowing for their effect on consumer expenditures. But why is the 
estimated change no larger than this? It is recognized that producers 
and distributors are eager to replenish their inventories but it is 
thought that they can not get the supplies they seek, except over a long 
period. If that conclusion is right, it means that for a long time pressure 
to increase prices and to trade in the black market will be great, if 
price ceilings are held and enforced in the usual trade channels. Within 
limits, the smaller the increases shown for inventories, the more serious 
—in a period like this—is the likelihood of further inflationary price 
advances. And ultimately, I may add, the greater is the danger of un- 
employment in a period of reaction from a speculative price rise. The 
same general analysis applies to the net export figures in the gross prod- 
uct table—if exports are small simply because the goods are not avail- 
able, it means that many markets are continually in a tight position 
and that price increases can be avoided only by authoritarian measures. 
If this same analysis is made for residential building, the evidence is 
clear right now. People are ready to buy many more new houses than 
can be built in 1946 and prices are continuing to rise—to levels, in- 
cidentally, which are far beyond what the cost indexes, with constant 
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percentage margins for contractors, would suggest. In a period of this 
sort a figure which shows only how much will be spent for new houses 
at constant prices or how many homes will be built gives little idea of 
the demand for new houses and provides no adequate basis for forecast- 
ing price developments in that field. Nor does it even contribute 
properly to an estimate of global demand to be matched against global 
supply. As I see it, analysis of the importance of price changes, of the 
way they develop and cumulate, and of the possible methods of fore- 
casting them needs to be developed by methods somewhat different, 
and in a sense more comprehensive, than those used in the gross 
product approach. 

Clearly, the suggestions offered here imply something about the 
emphasis to be placed on the development of mathematical devices in 
forecasting. Without in any way wishing to disparage the use of 
mathematics at appropriate points, I think it is important to recognize 
that there are very real limits to what can be accomplished by trying 
to describe in mathematical terms an economic world in which changes 
stem from so many influences of such varied significance in different 
periods. My emphasis—and this is the fourth of the suggestions about 
forecasting—would be on improving our information (about inven- 
tories, investment in plant and the like, as well as about consumption) 
and developing an analytical approach sufficiently broad and flexible 
to promote understanding of all the different types of situations which 
may lie ahead. No formulae or systems of formulae tied tightly to the 
past (a short past, too, for data of much reliability) can do anywhere 
near all the job. And efforts to formalize knowledge may divert atten- 
tion from establishing consistency between estimates of the parts and 
the facts to establishing internal consistency of a sort within a system 
of figures. The choice of independent variables, as noted in the discus- 
sion of the first point above, may be arbitrary and give rise to mislead- 
ing results. Perhaps all this is just another way of saying that the judg- 
ment of experienced people familiar with developments in economic 
and political affairs, domestic and international, is one essential ele- 
ment in good forecasting. Earlier I referred to the “art of forecasting”; 
that phrase was chosen deliberately. 

Implied in all this discussion of forecasting consumption and eco- 
nomic activity generally is a conclusion which may be stated as a 
fifth point: that the art of forecasting is not yet fully developed—or 
perhaps even that there are narrow limits to what can be done in this 
direction—and that forecasters should be careful not to claim too much 
for their present methods. For many purposes, such as the preparation 
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of budget messages, forecasts must be made currently by the best 
means available at any particular time and then be defended. But it 
hardly seems desirable, in discussing the feasibility of a full employ- 
ment program, to place great emphasis on the improvement made to 
date in forecasting techniques. Rather, those who support a full em- 
ployment program, as I do, might better argue that such a program is 
essential and that, in order to carry out some parts of the program, 
provision should be made for obtaining much better information than 
we have and for stimulating efforts to learn more about forecasting. 

The future. The final section of this paper was to deal with the future 

of measurement and forecasting in the field of consumption but the 
crystal ball is clouded and I see no formulae for predicting such matters. 
The bibliography on this phase of the problem is fairly short and the 
opportunities for people of imagination to make real contributions are 
great. 
Even so, the present statistical situation in this field is very much im- 
proved over that prevailing immediately after the First World War, 
at least in absolute terms. What it may be in relation to needs, I leave 
you to judge. Looking back over the history of the period between the 
wars, it seems evident that the need for information and analysis in this 
field was great right along. An understanding of what was happening 
in consumption as compared with what was happening in capital forma- 
tion might have provided the basis for better policy in the nineteen 
twenties, although the temper of the times was such that any advice 
economists might have given on this subject probably would have been 
little heeded. In the past decade considerable thought has been given 
to objectives and methods and there are numerous economists familiar 
with problems in the measurement and forecasting of consumption. 
Moreover, there are many more people in the community generally who 
are aware of such problems. Consequently, there is more chance that 
the thought of economists may be pertinent and be reflected in 
action. It is true that recent carefully considered proposals for a large 
study of family income and a companion study of consumer purchases 
have not yet been approved in Congress. But this should not be re- 
garded as too discouraging under existing conditions. 

The problem of developing satisfactory time series for the physical 
volume of consumption and for the physical volume of production of 
goods for consumers is real and will need to be explored carefully. Pre- 
liminary investigation of production data, such as has been made at 
the Reserve Board, and the work of the Combined Resources Board 
group show that the underlying data available, particularly on a cur- 
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rent basis, fall considerably short of what might be desired, as, for 
example, in the field of textiles. 

In the international area the occasion for further development of 
information and analysis in the field of consumption may not appear 
with the same force as it is likely to in the domestic situation, and with 
the difficulties of comparison much greater because of much greater 
cultural differences, the limits on what can be found out are nodoubt 
narrower. Nevertheless, the question of relations between the nations 
which “have” and those which “have not” is a very basic one, even as 
the question of distribution of income and consumption within a coun- 
try is; and sooner or later we may see developments in the field of inter- 
national statistics on consumption much greater than it now seems 
reasonable to expect. 

Whatever the developments in measurement may be, there will 
always be forecasting, and forecasting of consumption will be part of 
the job. There are likely to be further changes in methods as the pres- 
sure of events is brought to bear on the work of forecasters and as more 
consideration is given to the choice of methods for appraising prospects. 
Sharper distinctions may be drawn in our thinking about various length 
time-periods and the extent to which forecasts may be useful and feasi- 
ble for six-month periods, two-year periods, and quarter-centuries. The 
tendency to overemphasize similarities between particular and average 
periods and to discount differences may give way gradually. We need 
not be too much surprised when any particular set of ABC series is 
superseded. We should continue to avoid crystallization on any narrow 
view and to develop all approaches appropriate to the wide variety of 
situations likely to develop. 








THE USE OF ADJUSTING FACTORS IN THE ANALYSIS OF 
DATA WITH DISPROPORTIONATE 
SUBCLASS NUMBERS* 


R. E. Parrersont 


This paper presents a new method for the analysis of vari- 
ance of multiple classifications with unequal subclass num- 
bers. It is believed that the method is simpler and more 
expeditious than the standard method of fitting constants. 
The process of adjusting is accomplished by substituting in 
the following equation: 

Xij— Xj4+-X= Ag 
where X,; is the mean of the ,th subclass in the jth row or 
column, Xj is the mean of the jth row or column, X is the 
grand mean and 4A;,; is the adjusted mean of the th subclass 
in the ;th row or column. 

The method is based upon the assumption that the weighted 
sum of squares of the subclass means that are adjusted for the 
border mean effects is an efficient estimate of the variance 
due to interaction. Justification of this assumption is indi- 
cated by the fact that the difference between the differences 
of subclass means for a given classification is unchanged by 
the adjusting process. It is further demonstrated that if a 
sufficient number of adjustings are carried out the results 
will be the same as those obtained by the method of fitting 


constants. 
INTRODUCTION 


N THE field of animal science data are often encount« red in which the 
pine numbers are neither equal nor proportional. The conse- 
quences of this disproportionality in the subclasses have been reviewed 
by Yates (1933) and Snedecor and Cox (1935). Considerable limitations 
have been imposed on the interpretations of such data, the solutions 
being generally considered, at best, only approximations. 

One of the oldest methods of analyzing data with unequal numbers Is 
that of correcting or adjusting for differences among the means. By 
this method two or more groups may be made, on the average, homo- 
geneous for some given effect. The more important second effect can 
then be studied and compared without the disturbing influence of the 
corrected or adjusted effect. Although this method has had extensive 


* Credit and thanks are due to Professor C. B. Godbey of the Agricultural and Mechanical College 


of Texas for help in the preparation of this paper for publication. 
t Animal husbandman, Division of Range Animal Husbandry, Texas Agricultural Experiment 


Station, Agricultural and Mechanical College of Texas. 
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use, especially by workers in the animal sciences, users usually feel 
obligated to apologize for it even when no other methods are applicable 
to the data. 

The objects of this paper are (1) to show the relationship between the 
elimination of variance by a method of adjusting and the segregation 
of variance by the analysis of variance, (2) to demonstrate the use of 






































TABLE 1 
THE NUMBER AND MEAN VALUE FOR MALES AND FEMALES FOR EACH OF FOUR 
GENERATIONS* 
Males Females Total 
Generations 
Number | Mean Number Mean Number Mean 

1 15 12 15 6 30 9.00 

2 15 10 15 10 30 10.00 

3 15 8 15 7 30 7.50 

4 15 9 15 5 30 7.00 

Total 60 | 9.75 60 7.00 120 8.375 
* Hypothetical data for the purpose of illustration. 
TABLE 2 
ANALYSIS OF VARIANCE OF THE DATA IN TABLE 1 
Degrees Sum 
Source of Variation of of hn 
Freedom Squares — 

Total 119 792.125 
Between generation-sex subclasses 7 568.125 81 
Within generation-sex subclasses 112 224 .000 2 
Between sex means 1 226.875 227 
Between generation means 3 170.625 57 
Interactions of sex Xgeneration 3 170.625 57 














adjusting for segregating the confused variability in a two-way table 
when the main effects are non-orthogonal caused by disproportionate 
subclass numbers, and (3) to offer a method of adjusting as a substitute 
for the more algebraic and lengthy least square method of fitting con- 
stants. 


ANALYSIS OF VARIANCE AND ADJUSTING IN DATA WITH EQUAL 
SUBCLASS NUMBERS 


In the analysis of a set of data two problems are confronted—one of 
estimating the variance due to several sources and the other of testing 
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the significance of these effects producing the variance. In data with 
equal or proportionate subclass numbers, the method of analysis of 
variance gives both an estimation of the variance and a test of sig- 
nificance. The method of adjusting herein reported can be used to 
eliminate the variance for any effect that can be segregated by the 
analysis of variance. The utility of this property of adjusting shall be 
made clear when data with the disproportionate subclass numbers are 
discussed. 

For illustrating the elimination of variance by the method of adjust- 
ing, the hypothetical data of Table 1 will be used. The results obtained 
from these data by the usual method of analysis of variance are shown 
in Table 2. 


TABLE 3 
SUBCLASS MEANS OF TABLE 1 ADJUSTED FOR DIFFERENCES IN SEX MEANS 






































Males | Females Total 
Generations 
Number Mean | Number | Mean | Number Mean 
1 15 10.625 | 15 7.375 30 9.00 
2 15 8.625 15 11.375 30 10.00 
3 15 6.625 15 8.375 30 7.50 
4 15 7.625 15 6.375 30 7.00 
Total 60 8.375 | 60 | 8.375 1200 | 8.375 
Adjusted factors for sex: Male = —1.375, Female = +1.375. 
TABLE 4 
ANALYSIS OF VARIANCE OF THE DATA IN TABLE 3 
Degrees Sum 
A 
Source of Variation of of sand 
Freedom Squares — 
Total 119 565.250 

Between generation-sex subclasses 7 341.250 49 

Within generation-sex subclasses 112 224.000 2 

Between sex means 1 0 0 

Between generation means 3 170.625 57 

Interactions of sex Xgeneration 3 170.625 57 














As noted in Table 1, the two sex means differ from the grand mean by 
+1.375 for the males and —1.375 for the females. These differences 
show how much each sex is above or below the average of the entire 
data. If the individuals within each sex are adjusted or corrected in such 
way that each sex mean will be equal to the population mean and equal 
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to each other, there will, of course, no longer be a difference between 
the sexes. This can be accomplished by the following equation: 


Xi, -—-X, +X = Ajj (1) 
where X;; is the ith individual in the jth row or column, X; is the mean 
of the jth row or column, X is the grand mean, and A,; is the adjusted 
ith individual in the jth row or column. It then follows that 

Nj 
As this is a type of coding that does not affect the variability within the 


subclasses, it is therefore not necessary to correct each value in the sub- 
class but only their mean: 


Xi — Xj; + X = Ais (3) 





(2) 


where X;; is the mean of the ith subclass in the jth row or column and 
A;; is the corrected mean of the 7th subclass in the jth row or column. 


TABLE 5 


SUBCLASS MEANS OF TABLE 3 ADJUSTED FOR DIFFERENCES BETWEEN 
GENERATION MEANS 
































Males Females Total 
Generations 

Number Mean Number Mean Number Mean 

1 15 10.000 15 6.750 30 8.375 

2 15 7.000 15 9.750 30 8.375 

3 15 7.500 15 9.250 30 8.375 

4 15 9.000 15 7.7 30 8.375 

_— 
Total | 60 8.375 60 8.375 | 120 8.375 








Adjusted factors for generations: 1. —0.625; 2. —1.625; 3. +0.875; 4. +1.375. 


Such an adjusting procedure was applied to the sex differences in 
Table 1, giving the resulting values shown in Table 3. When the 
analysis of variance was applied to these adjusted values, the values of 
Table 4 were obtained. Comparing Table 2 with Table 4, it is observed 
that the variance between generations, interaction of generation X sex, 
and error or within subclasses are the same but the between sexes is 
zero in Table 4, while the total sum of squares is reduced by an amount 
equal to between sexes of Table 2. Thus, the variance between sexes 
has been completely removed without affecting any of the other sources 
of variation. 
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Continuing the process of elimination by adjusting, the values of 
Table 3 were adjusted for differences among generations. The genera- 
tion adjustments of the subclass means of Table 3 are shown in Table 5. 
An inspection of this table reveals that there is still no difference be- 
tween the sex means and now no differences among the generation 
means. The results of analysis of variance are shown in Table 6. Here, 
the total sum of squares is reduced by an amount equal to the sum of 
the between sexes and between generations of Table 2. There remains 
in the data, therefore, only that variation attributed to the interactions 
of sex X generation and the within subclass variation. 


TABLE 6 
ANALYSIS OF VARIANCE OF THE DATA IN TABLE 5 




















Degrees Sum | 
Source of Variation of of a 
Freedom Squares | — 
Total 119 394.625 
Between generation-sex subclasses 7 170.625 24 
Within generation-sex subclasses 112 224.000 2 
Between sex means 1 0 0 
Between generation means 3 0 0 
Interactions of sex Xgeneration 3 170.625 | 57 





An interesting feature of interaction is obtained from a comparison 
of Tables 1 and 5. To illustrate, in Table 1 the difference between the 
males of generation one and the males of generation two is +2.0, while 
the difference between the females of generation one and the females 
of generation two is —4.0. The range between these two differences is 6. 
This is a type of variation that determines interaction. The differences 
between the same two pairs of subclass means in Table 5 are +3.0 and 
—3.0. The range between the two differences is 6.0 as before. This shows 
that interaction has not been affected by adjusting for the main effects. 
Although the differences may be changed, the range between the dif- 
ferences will always remain the same. This will be considered again in 
connection with dispioportionate subclass numbers. 

The interactions of sex X generation also can be removed from 
Table 5, leaving the subclass means equal and equal to the grand mean. 
The removal of interaction can be accomplished before or after either 
of the main effects are removed by adjusting for each of three correctly 
chosen sets of means. 

If the analysis of variance is made considering the main effects elimi- 
nated, the corresponding degrees of freedom for those effects must be 
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considered utilized in the adjusting and removed from the total. Thus, 
in the analysis of Table 4 where the sex effect was eliminated there 
would be 118 degrees for total, three degrees for between generations, 
three degrees for interaction, and 112 degrees for within subclasses, the 
one degree of freedom for between sexes being utilized in the adjusting. 
Ignoring the reduction in degree of freedom when there has been a re- 
duction in sum of squares by adjusting might result in an erroneous 
conclusion from the data. 


ADJUSTING IN DATA WITH DISPROPORTIONAL SUBCLASS NUMBERS 


In data where the subclass numbers are unequal and disproportionate 
the ordinary method of analysis of variance is not applicable. Snedecor 
and Cox (1935) have pointed out some of the difficulties encountered in 
such data. The principal one is that the addition theorem does not 




















TABLE 7 
DATA TAKEN FROM SNEDECOR AND COX (1933) 
Males | Females | Total* 
Generations } 

Number Mean Number Mean | Number Mean 
1 76.952 27 9.518 48 39 .020375 
2 61.467 25 14.080 40 31.850125 
3 2 55 .667 23 8.522 35 24 .686000 
4 71.000 19 6.790 | 26 | 24 .077308 

Total* 55 67 .327291 


94 9.936191 | 149 | 31.120826 
} | 





* The means for the border totals are carried six places in order to insure a more accurate compari- 
son with the values obtained by the least square method of fitting constants. 


apply. It is desirable to point out other direct consequences of dispro- 
portionate subclass numbers upon the means of the main effects as well 
as upon the interactions between these main effects. For discussion and 
re-analysis the data given by Snedecor and Cox (1935) will be used. 
These are shown in Table 7, where each of the means is decreased by 
100. 

There is no question that the subclass means of a set of data are good 
estimates of the parameter subclass means, even when the numbers are 
disproportional. However, when the subclass numbers are dispropor- 
tional, the differences among the border means are not true estimates 
of the parameter differences, because these differences are determined 
not only by the effects of that classification but include also some of the 
effects that are exhibited in the other or other classifications. In other 
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words, the main effects are not orthogonal, due to disproportional sub- 
class numbers. In the example the difference between the sex means 
is no single function of sex but also reflects some of the generation dif- 
ferences. Likewise, the differences among the generation means not only 
reflect all the effects of generations but a part of these differences is 
caused by differences between sexes. Fundamentally, this is the pri- 
mary reason the addition theorem does not hold—the main effects are 
confounded or non-orthogonal. 

Of the several methods that have been developed for the analysis of 
data with disproportionate subclass numbers, the least square method 
of fitting constants, as reported by Brandt (1933) and extended in its 


TABLE 8 
DATA OF TABLE 7 ADJUSTED TO ZERO SEX DIFFERENCE* 











Males Females | Total 














Generations . 
Number | Mean | Number | Mean Number Mean 
1 40.745535 27 30.702635 | 48 | 35.096404 
2 25. 260535 25 35.264635 | 40 31.513098 
3 2 19.460535 23 29 .706635 | 35 26. 193686 
4 34.793535 19 27.974635 | 26 | 29.810493 
Total 55 31.120826 | 94 31.120826 | 149 | 31.120826 





Adjusting factors for sex differences: Male = —36.206465, Female = +21.184635. 
* Example, to compute 40.745535, the mean of the males in generation one, add —36.120826 to 
76.952 of Table 7. 


application by Yates (1934), is considered to have the widest range of 
application and the most accurate results. The primary assumption is 
that there are no interactions in the population from which the sample 
is drawn. In this method a set of constants is fitted to the data with the 
condition that they determine a set of subclass means with zero inter- 
actions but with the values of the main effects unchanged. Next is cal- 
culated the reduction in sum of squares due to fitting the constants, and 
from this an appropriate correction for disproportionate subclass num- 
bers is obtained. 

By the method of adjusting, practically the same results are obtained 
as by the method of fitting constants. With this method the assumption 
is made that the sum of squares of the subclass means that are adjusted 
for the border effects is an efficient estimate of the variance due to 
interaction. If the difference between differences of subclass means for 
a given classification determines interaction, the assumption is highly 
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justified. That is, the border effects are not confused with the inter- 
action between these effects. 

By using the data in Table 7, the results of the method of adjusting 
are compared with those obtained by Snedecor and Cox (1935). Con- 
sider that the first object is to estimate the sum of squares due to gen- 
eration means. This is to be based on generation means stripped of any 
influence of sex differences. Therefore we shall remove that variation 
due to sex by substituting in equation 3. This gives a set of subclass 
means that are not influenced by sex (Table 8). However, as the sex 
means had some influence of generations, as described above, the 
adjusting removed this also. When the analysis of variance is applied, 
the weighted sum of squares between gencrations is found to be 
1659.13869. This is smaller than it should be for the reason just given. 


























TABLE 9 
DATA OF TABLE 8 ADJUSTED TO ZERO GENERATION DIFFERENCES 
Male Female Total 
Generation : cinenttions 

Number Mean Number Mean | Number Mean 
1 21 | 36.769957 27 26 .727057 48 | 31.120826 
2 15 | 24.868263 25 34.872363 | 40 31.120826 
3 12 24 .387675 23 | 34.633775 | 35 31.120826 
4 7 36.103868 19 29 . 284968 26 31.120826 
Total | 55 | 30.737677 94 | 31.345009 | 149 31.120826 


' 








Adjusting factors for generation differences of Table 8: 1. —3.975578, 2. —0.392272, 3. +4.927140, 
4. +1.310333. 


In order to recover some of the generation effects that were removed 
with sex because of the disproportionate subclass numbers, the sub- 
class means in Table 8 are adjusted for generation effects. Because of 
the disproportionate subclass numbers, the difference between the sex 
means will no longer be zero (Table 9). By calculating the between sex 
weighted sum of squares, 12.80149 is obtained. This value is part of the 
generation sum of squares removed with sex in the initial adjusting. It 
is perhaps well to emphasize here that the final resulting sum of squares 
will be the result of contributions from both sets of border means. 
Adding 12.80149 to 1659.13869 gives the sum of squares for between 
generations of 1671.94018, a value within the limits of rounding equal 
to that found by the method of fitting constants. 

The next step is to readjust for sex effects. When this is done, the 
weighted sum of squares between generations is found to be 0.19049. 
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As this process is continued, each additional correction will recover a 
part of the between generation sum of squares that was eliminated with 
sex initially. In this example, however, the amounts recovered after the 
first two adjustments are very small. The sum of squares for between 
generation, as found in six adjustments, is 1672.14380 or the sum 
1659.13869 + 12.80149 + 0.19049 + 0.00712 + 0.00371-+-0.00230, repre- 
senting the first to the sixth adjusting respectively. 


TABLE 10 


DATA OF TABLE 7 AFTER SIX ADJUSTMENTS, THE INITIAL ADJUSTMENT BEING 
FOR SEX DIFFERENCES 





























Male Female Total 
Generation 
Number Mean Number Mean Number Mean 
| | 
1 21 37 .116597 27 26 .457449 | 48 31.120826 
2 15 25.253418 25 34.641270 40 | 31.120826 
3 12 24 .792637 23 34.422489 35 31.120826 
4 7 36 .554203 19 29.119055 26 31.120826 
Total 55 31.120743 94 31.120874 | 149 31.120826 
TABLE 11 


DATA OF TABLE 7 ADJUSTED TO ZERO GENERATION DIFFERENCES 


























Male Female Total 
Generation 
Number Mean Number | Mean Number Mean 
1 21 69 .052541 27 1.618451 4s | 31.120826 
2 15 60.737701 25 13.350701 40 | 31.120826 
3 12 62.101826 23 14 956826 35 | 31.120826 
4 7 78 .043518 | 19 13 833518 26 | 31.120826 
Total 55 66 412610 | 94 10.471378 | 149 | 31.120826 





Adjusting factors for generation differences of Table 7: 1. —7.899549, 2. —0.729299,3. +6.434826, 
4. +7.043518. 


Table 10 shows the data after the process of elimination is continued 
six times, beginning first with the elimination of the sex differences. 
As may be observed in this table, each border mean is approximately 
equal to the grand mean. The variability among the subclasses is there- 
fore considered due to the effects of interactions. By calculating the 
weighted sum of squares for between subclasses, a value of 3182.40355 
is obtained. This is, within the limits of rounding, equal to the inter- 
actions found by the least square method of fitting constants, 
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By the same procedure the sum of squares for between sex means 
can be obtained. After adjusting for generation effects of Table 7, the 
sum of squares for sex (Table 11) was found to be 108,584.62490. When 
adjusted for sex differences the sum of squares for between genera- 
tions (Table 12) was 1594.09223, which is actually due to sex. Then 
108,584.62490 plus 1594.09223 gives 110,178.71713. This value is still 
somewhat smaller than that found by the method of fitting constants. 


























TABLE 12 
DATA OF TABLE 11 ADJUSTED TO ZERO SEX DIFFERENCES 
Male Female | Total 
Generation 
Number Mean Number Mean | Numbcr Mean 
1 21 33 .760757 27 22 .267899 48 27 . 295985 
2 15 25.445917 25 34.000149 40 30 .792312 
3 12 26 .810042 23 35 .606274 35 32.590423 
4 7 42 .751734 19 34 .482966 26 36 .709173 
Total | 85 31. 120826 | 04 | 31.120826 | 149 | 31.120826 





Adjusting factors for sex differences of Table 11: Males = —35.291784, Females = +20.649448. 























TABLE 13 
DATA OF TABLE 12 ADJUSTED TO ZERO GENERATION DIFFERENCES 
Male Female | Total 
Generation 
Number | Mean Number Mean | Number Mean 
1 37 .585598 27 | 26 .092740 48 | 31.120826 
2 f 25.774431 25 34 .328663 40 ; 31.120826 
3 12 | 25.340445 23 | 34.136677 | 35 | 31.120826 
4 7 | 37 . 163387 19 28 .894619 26 31.120826 
| 
Total | 55 31.638965 | 94 | 30.817680 194 | 31.120826 





Adjusting factors for generation differences of Table 12: 1. +3.824841, 2. +0.328514, 3. 
—1.469597, 4. —5.588347. 


However, a considerable amount of sex influence was removed with 
generations and not all was recovered at the first readjustment. Read- 
justing for generation differences (Table 13) yields a sum of squares 
between sexes equal to 23.52606. This is also due to between sexes and 
when added to 110,178.71713, gives 110,202.24319, a value similar to 
that found by fitting constants. The next adjusting would yield a sum 
of squares less than 0.50. There was thus needed two readjustments to 
regain most of the sum of squares for sex that was eliminated with 
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generation in the initial adjustment. If the elimination process is con- 
tinued, a set of subclass values would be obtained that are practically 
equal to those of Table 10, where the initial adjustment removed the 
sex effect. 

In actual practice it is necessary only to find the unaffected adjusted 
sum of squares for one of the border effects, as the difference between 
the unadjusted and adjusted sum of squares for one effect is equal to 
the same type of difference for the other border effect. Thus, the dif- 
ference between the original sum of squares for generation (5756) and 
the adjusted (1672) equals 4084. To find the sum of squares for between 
sexes subtract 4084 from the unadjusted sum of squares (114,287), 
which gives 110,203, as obtained above for between sexes. Interaction 
may also be calculated from the subclasses that have been adjusted for 
either set of border means in the initial adjustment. 

Table 14 shows the combined results of the analysis of variance by 
the method of adjusting. The mean square values when rounded off 




















TABLE 14 
ANALYSIS OF VARIANCE OF TABLE 7 BY THE METHOD OF ADJUSTING 
Degrees Sum Mea 

Source of Variation of of ° — 
Freedom Squares — 
Between sex means 1 110,202.12 110 , 202 
Between generation means 3 1,672.03 557 
Interactions 3 3,182.89 1,061 
Within subclasses 141 409 














are equal to those found by Snedecor and Cox (1935) from fitting con- 
stants to the same set of data. The sums of squares by the two methods 
differ because of rounding off and because a part of the sum of squares 
removed in each initial adjusting was never recovered. 

It will be observed that the differences between the differences of the 
subclass means of Table 10 are the same for corresponding values found 
in Table 7. For example, the difference between the two male subclasses 
for generations one and two in the unadjusted data is +15.485, while 
that between the female subclasses for generations one and two is 
—5.562. The difference between these two differences is 20.047. In 
Table 10 the difference between the corresponding first pair of sub- 
classes is 11.863179 and between the second pair is —8.183821. The 
difference between these two differences is 20.047000, the same as 
above for the unadjusted data. If interaction is determined by such 
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differences, adjusting for the border means has in no way affected the 
interactions. 

Furthermore, the method of adjusting applied to data computed from 
fitted constants, so that the interactions are zero, gives the same esti- 
mates of sums of squares for the main effects as those obtained when 
the original data are used. Thus, the sizes of the estimates of sums of 
squares for the main effects are not influenced by the amount of inter- 
action present. There is also no evidence that the variances for the 
border effects are less well estimated as the departure of the subclass 
numbers from proportionality becomes greater. In such cases, however, 
the border effects become more confused or confounded. 

The adjusted border means may also be calculated by the method of 
adjusting. Such calculated means are devoid of the confounding effect 
brought about by disproportionate subclass numbers. The adjusted 
mean of generation one is found, for example, by adding to the sex ad- 
justed generation mean the difference between the succeeding sex 
adjusted means and the grand mean. Thus, the original sex adjusted 
generation one mean is 35.09640. The same generation after adjusting 
for sex the second time is 31.16235, which differs from the grand mean 
by 0.04152. The third sex adjusted generation one mean is 31.12144 
with a difference of 0.00061. Further sex adjustments will result in con- 
tinually small differences. The mean for generation one will be for the 
three sex adjustings 35.09640+-0.04152+-0.00061 =35.13853. In a simi- 
lar manner, generation two is found as 31.51672. The difference be- 
tween these two means is 3.62181. This is in very close agreement with 
the difference between the two (b) constants for the same two genera- 
tions. By the same method, adjusted sex means can be obtained as well 
as the difference between sexes unaffected by generation differences. 

Both sets of adjusted border means can be found when the initial 
adjustment has been for either one of the effects. In other words, the 
sex means can be calculated when the initial correction was for the sex 
difference. For example, the sum of the unadjusted mean for the males 
(67.327291) and the differences between sex and generation adjusted 
male means and the grand mean (—0.38149 plus —0.005625) equals 
66.938517. By the same method the mean for the females was found 
to be 10.163665. The difference between the two sexes is therefore, 
56.774852, a difference similar to that between the (a) constants of the 
least square method of fitting constants. Thus, it is necessary only to 
carry out the elimination process for one set of border means and then 
make as many recovery adjustments as may be considered necessary. 
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SUMMARY AND CONCLUSIONS 


A method of adjusting is described whereby the sum of squares of the 
various sources of variance can be eliminated. When this method of 
adjusting is applied to data with unequal subclass numbers, it is pos- 
sible to obtain a sum of squares for each source of variance that is free 
of the influence of the other effect. 

The method of adjusting is applied to data with unequal subclasses 
that had been analyzed by the least square method of fitting constants, 
showing that the same results can be obtained by both methods. It is 
indicated that the adjusting method is much simpler and requires less 
laborious mathematical computations than the method of fitting con- 
stants and is therefore offered as a substitute for it. 

Although the method of adjusting has not been tested extensively 
nor subjected to algebraic proof, it has given results similar to those 
obtained by the method of fitting constants in several two-way sets of 
data.! It seems safe, therefore, to conclude that the method can be 
substituted for the least square method in data where the latter is ap- 
propriate. 
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1W. G. Cochran informs me that it can be proved algebraically that if a sufficient number of 
adjustments are carried out, the method of adjusting gives exactly the same results as the method of 
fitting constants. 
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A PUNCHED CARD METHOD FOR PRESENTING, 
ANALYZING, AND COMPARING MANY SERIES 
OF STATISTICS FOR AREAS 


BERTRAM J. BLACK 
Former Director, Bureau of Social Research 
Federation of Social Agencies of Pittsburgh and Allegheny County 
AND 
Epwarp B. OLps 
Director, Research Bureau 
Social Planning Council of St. Louis and St. Louis County 


Information about small areas such as census tracts, wards, 
cities, counties, etc., should be available to the consumer in 
a compact readable form. To meet the problem of multi- 
tudinous tables in large expensive volumes, the method de- 
scribed below was devised for the analysis and publication 
of data concerning the neighborhoods of Pittsburgh and 
Allegheny County. By using punched card procedures, making 
use of summary cards, machine multiplication and other 
techniques, a set of cards was prepared containing the social 
characteristics of each area and including percentile rankings 
identifying the relative position of each area for a particular 
characteristic. The statistical tables were reproduced directly 
from the listing sheets. Instead of repeating identical heading 
stubs for each of the tables, one set of stubs was printed on 
overlapping tab cards bound to the right side of the book. 
This alone makes for a most efficient use of paper and reduces 
the size and cost of the books. This method of analyzing and 
presentation can be applied to data for cities, counties, and 
metropolitan areas, as well as to small areas within cities. 


HE increasing quantity of available information for small areas 
jo as census tracts, wards, study areas, cities, and counties is 
making the work of analysis, comparison, and publication more and 
more complicated and time-consuming. The consumer of statistics for 
a particular area does not usually have the time nor the inclination to 
spend weeks calculating rates and percentages so as to determine the 
standing of his city or neighborhood among others. Neither does he 
wish to consult a dozen different sources for his basic information. The 
data should be available to him in a compact, readable form so an- 
alyzed that he can see how his community compares with others in the 
most important characteristics for which statistics have been compiled. 

This problem has only partially been solved by the publication of 
tables which show percentages, rates, and rankings for a group of cities, 
counties or census tracts. Physical limitations upon the size of tables 
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necessarily prohibit the presentation of more than about half a dozen 
series in one table. To obtain information on several hundred series, it is 
necessary to search scores of tables. This necessarily requires a large, 
cumbersome expensive volume. 

Another approach to the problem has been attempted through the 
publication of specially prepared booklets for each area. This is an 
ideal solution if funds are available to cover the cost of publication. 
Unless the areas are large counties or cities, it is doubtful whether the 
returns from sales can be expected to cover costs. 

The method of meeting the problem outlined in this paper has been 
applied to the analysis and publication of data concerning the neighbor- 
hoods of Pittsburgh and Allegheny County. The techniques used grew 
out of experience gained over the preceding decade in the analysis of 
small area data and in their presentation. Through the assistance of 
NYA and WPA projects during the Depression years, many special 
tabulations had been made of local data.' Most of these data had been 
summarized in a series of “Social Facts Booklets.” These consisted of 
mimeographed forms on which the data were typed. The preparation of 
copies of these booklets proved to be slow and expensive. Many users 
were interested in more than one booklet which further complicated the 
problem of distribution. 

The vast quantity of new data from the 1940 Population and Hous- 
ing Census which needed to be analyzed and compared with the 1930 
census data, as well as with data from numerous local tabulations, pre- 
sented a real problem in clerical man hours. By the time the data were 
available, clerks and calculating machines were difficult to find. The 
choice seemed to narrow down to postponing extensive analyses until 
after the war or of developing some labor saving method of doing the 
job. Because of the great interest in the data for post-war planning 
purposes, it did not seem wise to postpone. Accordingly, considerable 
work was done in the summer of 1942 in the development of a plan for 
the analysis and publication of the material through the use of punched 
ards. 

Since the basic data were available in most instances by census 
tracts, the plan provided for the key punching of one card for each 
census tract for each series to be analyzed. The basic card form used is 
illustrated in Fig. 1 (Card I, the upper half of the card headings). The 
design of this card was changed many times during the course of laying 
out the plan for the project. Only the last 21 columns of this card were 


1 Edward B. Olds, “The Use of NYA Workers in Ecological Studies,” Social Forces, Vol. 20, No. 2, 
December, 1941, pp. 218-223. 
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manually punched and in some instances only the last 10 columns. 
These fields showing the denominator or base used in computing per- 
centages or rates, the actual per cent, rate or average, the numerator or 
frequency for the particular area for the particular series, and the area 
code number, were punched directly from the published census tract 
figures wherever possible, or from carefully prepared work sheets. The 
fields labelled “Gang Punch I” were then punched by the reproducing 
machine to indicate the various codes for the particular series. These 
codes will be described later. After all the cards were punched and veri- 
fied for the various series, the cards were sorted by area number. Pre- 
viously punched master cards were then used to reproduce area codes 
and descriptions into the fields labelled “Gang Punch II.” 

Another type of card form shown in Fig. 1 (Card II, the lower half 
of the card headings) was used to obtain summaries of frequency dis- 
tributions for such series as “Monthly Rent of Homes.” To compute 
medians for areas made up of two or more census tracts, it was neces- 
sary to obtain totals for each class of the frequencies for each tract. 
Columns 3 to 11 on this card form contained the same area code in- 
formation as the Card I form on the upper half of Fig. 1. 

The area codes were used to indicate the code number of the embrac- 
ing study area, service area, or minor civil division for the particular 
census tract. By sorting on these codes, it was possible mechanically to 
punch summary cards for the various types of summary areas. In this 
manner it was not necessary to manually add or to key punch the 
cards for summary areas. Since the first two digits of the Pittsburgh 
census tracts indicate the Ward number, it was possible to obtain 
summary cards for wards by sorting and controlling on this number. 
The field labelled “type of MCD” was used to designate whether the 
minor civil division was a Borough, Township, or City and whether or 
not the area fell within the census definition of an urban place. The 
field labelled “Type of Area” indicated whether the area was a census 
tract, ward, study area, service area, minor civil division, county serv- 
ice area, or county ward. These various types of areas had developed 
historically as the most useful areas for purposes of neighborhood and 
community planning 

The population weight for each area (Field R) was arrived at by 
dividing the 1940 population of the area by 100. It was used as the 
weight of the area in computing the percentile ranking (to be described 
later). 

The Map Coordinate Number was used to indicate the exact posi- 
tion of each area on the census tract base map for Allegheny County. 
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This map was drawn with the possibility of running it through a 
tabulator and to have decile rankings listed directly on the map on 
each census tract as the basis for subsequent cross hatching. However, 
this method was found to be impractical in actual practice, and the 
maps were manually prepared from straight machine listings. 

The reciprocal of the denominator was gang punched in along with 
the area codes where the same base was applicable to a number of 
series. For example, the total population of an area was used over and 
over to compute percentages and rates. By punching the reciprocal of 
population into the cards for all these series, it was possible to compute 
the percentages by machine multiplication. For this a multiplier is 
used. 

The name of the area was gang punched into each card to provide 
easy identification when the cards for each series were later listed in 
order by the percentages. This listing showed the names of the areas 
that were highest and lowest as well as all the shades in between. 

It was decided to use “percentile rankings” to identify the relative 
position of each area in the array for a particular series. This seemed 
superior to simple ranks because it was independent of the number of 
areas in the array. For the publications herein described, the number 
of areas varied as follows: 


Minor civil divisions 126 
County censers tracts 297 
County service areas 26 
Pittsburgh service areas 19 
Pittsburgh study areas 53 
Pittsburgh wards 32 
Pittsburgh census tracts 194 


In addition, there were different numbers of minor civil divisions and 
of census tracts in 1930 and 1940. By reducing all ranks to percentile 
ranks, it was possible to indicate the standing of an area in a particular 
series without reference to these differences. 

The percentile ranks can be used to divide the areas into other de- 
sired groupings. For example, to divide the Pittsburgh census tracts 
into thirds on the basis of some factor, such as average rents, any tract 
with a percentile rank between 0 and 33 is in the lowest third; between 
34 and 67 in the middle third; and between 68 and 100 in the upper 
third. Since the percentile rank has been weighted to allow for the 
population of each tract, it can be used to divide the population into 
thirds, fourths, fifths, sixths, eighths, tenths, etc., on the basis of rates 
or percentages for neighborhoods. 
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In computing the percentile ranks, the population weights for the 
areas in the array were progressively totaled and printed on the 
tabulated list. This list was then marked off into 100 parts each of which 
included approximately one per cent of the total population. The 
punched cards were gang punched to indicate the appropriate per- 
centile rank on each card. The cards were then sorted into order by 
area and listed so as to indicate the significant information, including 
the percentile rank for each area, as illustrated in Fig. 2. 

After percentages, rates, averages and percentile ranks were com- 
puted for all areas, the cards were assembled and sorted to prepare the 
summary reports for each area for all series. These area reports were 
considered the final reports from the punched cards and were used as 
the copy for the photo-offsetted sheets bound in four volumes. In- 
stead of repeating identical heading stubs for each of the numerous 
area summary tables, one set of stubs is printed on overlapping tab 
‘ards bound to the right side of the book. Each of the six overlapping 
tab cards matches with one of six columns of data on the area tables 
which are bound to the left side of the book. This arrangement makes 
possible the most efficient possible use of paper and cuts down the size 
and cost of the books. It also facilitates the rapid location of any de- 
sired information for any area in the county. The tabs are illustrated 
in Fig. 3. 

Three types of information are presented in each column of the table. 
These consist of the actual frequency for the particular series such as 
the number of foreign-born persons; the percentage, such as the per 
cent of the total population which was foreign-born; and the percentile 
rank of the area according to the per cent foreign born. This detail is 
shown for approximately 175 series of data in the city of Pittsburgh. 
The bottom part of each table was used to show frequency distribu- 
tions without any percentages or percentile ranks for such series as 
population by age and sex, persons per household, monthly rent of 
homes, years of school completed, major occupation group, and type 
of structure. Selected information from these series was translated into 
rates, percentages, or averages and was presented also in the upper part 
of each table. 

The preparation of the area reports was greatly facilitated by the 
use of the punched cards to run lists which were photo-offsetted di- 
rectly, thereby minimizing typographical errors. The series code gang 
punched into the cards immediately after key punching, was so set up 
as to place the cards correctly on the final area tables. Thus series 101 
was listed on the first line of the first column on the report; series 102 
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on the second line of the first column; series 203 on the third line of the 
second column, etc. The final copy for the printer was obtained by 
pasting together two listings for the upper part of the table together 
with the listing for the lower part. Decimal points had to be put in by 
hand and some touch-up work was required. Guide lines were printed 
in red on the final tables to facilitate locating individual series without 
reference to the tabs. 

Considerable interest has been shown in the final volumes by plan- 
ning agencies, real estate companies, schools, newspapers, market 
analysts, and social agencies. One volume contains the ward informa- 
tion for Pittsburgh; another volume contains the study area informa- 
tion for Pittsburgh; the other two volumes are devoted to the 126 
minor civil divisions of Allegheny County. It is expected that the cost 
of publication will be covered by the sale of the books. The cost of the 
analytical work was covered by a special grant from the Buhl Founda- 
tion. 

The basic cards are being retained for special tabulations and listings 
for various studies which may be desired in the future. For example, it 
may be of significance to make summary tabulations of several series of 
data for somewhat different combinations of census tracts than those 
which have been used in the past. To obtain such tabulations, it is a 
relatively simple machine procedure to reproduce the census tract cards 
for the desired series and to gang punch new area codes. The cards can 
then be sorted by the new area codes and tabulated to obtain the totals 
for the new combinations of census tracts. The cards will also be 
available for correlation studies to demonstrate the close relationship 
between such factors as average rents and the birth rate, poor housing 
and large families. 

It is believed that this method of analysis and presentation could be 
applied to data for cities in various size classes or regions, to counties, 
and to metropolitan areas, as well as to small areas within cities. The 
chief difficulties encountered resulted from the constant turnover in 
the personnel engaged in the undertaking, as well as from problems 
of obtaining sufficient tabulating machine facilities. Both of these 
difficulties were unavoidable during the war period but would not be 
appreciable factors in times of peace. 








LINEAR REGRESSION FUNCTIONS WITH 
NEGLECTED VARIABLES* 


Howarp L. Jones 
Illinois Bell Telephone Company 


This article discusses some properties of the computed Y 
values obtained by fitting a linear regression function to in- 
dependent observations by the method of least squares. For the 
general case where the form of the fitted function may not be 
correct it is proved that (a) the sampling variance of the com- 
puted values and of the residual differences is the same as for 
the special case where the form of the fitted function is correct, 
and (b) the mean square bias of the set of computed values is 
less than, or equal to, that of any other set of linear estimates. 
These and other properties lead to the suggestion that in 
minimizing the mean square error, one or more variables be 
neglected unless Snedecor’s F is greater than two. 


N MAKING forecasts on the basis of empirical regression functions 

fitted by the method of least scuares, it is a common experience to 
find the forecasting errors to be so large that they can hardly be 
ascribed to chance alone. In fitting such functions, the following pos- 
sibilities should therefore be considered: 

1. The assumption about the errors of observation that underlies 
the least square procedure may not be realistic. In particular, 
the errors in the dependent variable may be serially correlated. 

2. The assumption about the relationship between the variables 
may not be correct. In particular, the complete relationship may 
involve variables not included in the regression function, or the 
form of the relationship between the variables may differ from 
the form of the fitted function. 

There is no satisfactory method now available for determining, from 
internal evidence, that one of these two difficulties, but not the other, 
is present. In fact, the two difficulties may both be present. By refer- 
ring to past experience, however, some indication may be had as to 
which one is likely to be the principal source of trouble. 

Consider, for example, the problem of fitting n terms in a time series 

by using a k-parameter polynomial function of time, 

* This article is the result of a search for a criterion to use in making a choice among several linear 
estimates of trend ordinates in a time series. Among these estimates were some computed from biased 
functions, developed jointly by John H. Smith and myself, in which least square regression functions 
were treated as independent variables in trying to find the linear function that minimizes the expected 


value of the mean square error. It is a pleasure to acknowledge my debt to Dr. Smith for his encourage- 
ment at all times. 
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f = ao + ait + al? +--+ + aul. 


Past experience may indicate that a satisfactory fit to a given set of 
observations can usually be obtained; but that when the polynomial is 
projected much beyond the range of the original observations the fit be- 
comes unsatisfactory. In this situation, suppose it usually happens 
that the poorness of fit is due solely to the computed values of the 
parameters and that, no matter how many homogeneous terms are 
added to the series, a satisfactory fit to the augmented series can still 
be obtained with a polynomial of the same degree (k—1) by making suit- 
able adjustments of the original parameters. Then the principal source 
of difficulty may reasonably be ascribed to the assumption about the 
errors of observation. On the other hand, if it is usually necessary to 
increase the degree of the polynomial in order to obtain a satisfactory 
fit to a longer series of the same kind of data, then at least part of the 
difficulty in fitting these regression functions may be ascribable to the 
assumption about the relationship between the variables. 

In this article, the mean square error and some other properties of 
linear regression functions will be investigated for the case where the 
second difficulty, but not the first, may be present.’ For this purpose, 
the mean square error is defined to be the mean of the squared dif- 
ferences, at the points actually observed, between the estimates com- 
puted from the regression function and the expected values of the 
dependent variable. For the general case where the form of the fitted 
function may not be correct, the following results will be proved: 

1. The mean square error of the estimates is the sum of two com- 
ponents, the mean sampling variance and a component which 
may be called the mean square bias. 

2. The sampling variance of the estimates, and of their differences 
from the observed values of the dependent variable, is the same 
for the general case as for the special case where the form of the 
fitted regression function is correct. 

3. The mean square bias of the estimates is less than, or equal to, 
the mean square bias of any other linear function of the same 
independent variables. 


1 For a discussion of the mean square error when neither difficulty is present, see Markoff, A. A., 
Wahrscheinlichkeitsrechnung, 1912, pp. 201-228 (translated into German by H. Liebmann from the 
Russian 2nd ed.); also David, F. N., and Neyman, J., “Extension of the Markoff Theorem on Least 
Squares,” Statistical Research Memoirs, vol. 2 (1938), p. 105. For the case where the errors in the de- 
pendent variable are serially correlated, see Aitken, A. C., “On Fitting Polynomials to Data with 
Weighted and Correlated Errors,” Proceedings of the Royal Society of Edinburgh, vol. 54 (1933), p. 12; 
also, Dixon, W. J., “Further Contributions to the Problem of Serial Correlation,” Annals of Mathe- 
matical Statistics, vol. 15 (1944), p. 119, and references cited on p. 144. 
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In the section headed “Practical Applications,” suggestions will be 
offered concerning the use of these properties in actual problems, in- 
cluding the problem of making a choice among several linear regres- 
sion functions where minimum mean square error is a satisfactory 
criterion. Employing the suggested criterion turns out to be equivalent 
to applying a test of significance to the partial regression coefficients 
and then neglecting the corresponding independent variables unless 
Snedecor’s F is greater than two. 


SPECIFICATIONS AND THEOREMS 
Assume that we are given n observations of the real variables Xj, 
Xo, se. ‘es and 


Y=yte (1) 


where each ¢ is a random selection from some universe such that, for 
every observation, 


E[X.e] = Elve] = Ele] =0, (¢ =1,2,---,h), (2) 


and where 


¥=E[Y], (3) 
the symbol £ denoting expected value. Make no assumption as to 
relationship between X,, Xe, ---, Xx, and y. Let 

f = BX. + BX2+--- + BX: (4) 


be the general expression for one of the n values of the linear regres- 
sion function of the X‘s, fitted to the n observed values of Y by the 
method of least squares; that is, let the regression coefficients B,, 
Bo, ---, By be computed by solving the k normal equations obtained 
by setting the partial derivative of 


Do (BX; + BX. +--+ + BrXs — Y)? 


with respect to each B equal to zero (>, denoting summation over all 
n observations). Assume that the solution is unique; and that it is 
possible to construct the k orthogonal linear functions 


11 = Xi, ) 
2 = Cx2 ry + Xo, | 
3 = CyX1 + CoX2 + Xs, t (5) 


De = CiXy + CuXe +--+ + Ce narXen + Xz, | 
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with coefficients Cy, Cy, -- +, Cr.x-1 such that 
z xx; = 0, 
p hs z;? ~ 0, 
fori=1, 2,---,k;j=2,3,---, k; 747. Let 
¢ = Eff) 


at a particular observation, for the set of n values of ¥, Xi, Xo, --- 


a. Let 
_ E[é], 





(6) 


(7) 


? 


(8) 


and assume go? to be finite. Then the main theorems to be derived may 


be written 
E[X (f — ¥)*] = X  — vy? + eo? 
and 
E[d0 (f — Y)*] = Le  — vy)? + (n — bio. 
Of more practical importance are the corollaries 


os E[D(f- Y)/(n-8)], 
1 1 
— 2 @—v)? = —E[D(f — ¥)*] — (n — bo*/n, 


| — ~~ ¢- »] = a> (f — Y)?] — (n — 2k)o?/n, 
n n 


and 


1 1 
E| — UU - »] = — )) yy? + the sum of k terms of the 
n n 


1 
form — {o? — (>> 2)?/>_ 2:7}; 
n 
also, 


1 
&° = — E[>> (f — Y)?] + 2ko?/n 
n 


and 


1 
i = E[— Du -w] +o 


(A) 


(B) 


(I) 


(II) 


(III) 


(IV) 


(V) 


(VI) 
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where 6 is defined by the relationships 


1 
6 = — Bld (f - Y’)?] (9) 


and 

Y' =y+e’, (10) 
each ¢’ being a random selection from the same universe as the e’s, 
but independent of the e’s included in the Y’s that were used in com- 
puting the B’s in equation (4). 

DERIVATION 


Assume that the coefficients bi, be, - - - , by in the regression function 
biti + bore +--+ + dere 
have been computed by the method of least squares so as to minimize 
D> (biti + bore + +++ + dere — Y)*. 


(In solving actual problems it is not necessary to make the transfor- 
mation to orthogonal functions; but the further discussion is simpli- 
fied if we assume that it has been made.) Then from (6) we see the 
normal equations take the form 


b>, zr? = p > ny, 
bs >> 2X2? > mY, 


(11) 
b>, a2 = > mY. 
It is known that under these conditions, 
biti + bote +--+ tom =f (12) 


where f is defined by equation (4). 
We are now ready to derive the sampling variance of f. From equa- 
tions (11) and (12), we have? 


f- 1 2m 2 to), =. dive >, reY ! 
> 2? > 22? > 22 


Replacing Y with y+ e, we obtain 


(13) 





2 Compare Davis, H. T., The Analysis of Economic Time Series, 1941, p. 96, equation (3); Jones, 
H. L., “Fitting Polynomial Trends to Seasonal Data by the Method of Least Squares,” Journal of the 
American Statistical Association, vol. 38 (1943), p. 453, equations (7) and (8). 
























TION 


(9) 


(10) 


 €S, 


‘OMm- 


tion 


nize 


for- 
pli- 
the 


11) 


(2) 


nes, 


‘the 





LINEAR REGRESSIONS AND NEGLECTED VARIABLES 361 








po Lew | mde 4... 4 
p » z° p X2? pa xx? 
4 n>, = 4 22>, — —— te >, Tee (14) 
aE Do a2" aE 
From (7), (14), (5), and (2), we have 
_ ad aw ‘ te >, tw és te >. rw . (15) 





boos de age 
2 a* Do 2" Da 
At a particular observation the sampling error in f is, by definition, 
f—El|f]; that is, f—¢. Let 

fSa=f—¢. (16) 
Then from (16), (14), and (15), 
a1 >, ne te >, Lee Te >, Tee 
S=— cane ie maw pe ce > eagee e 
Da? Do 22" ET 

From (5), (17), and (2), 
E[z¢] = Els] =0 (18) 


(17) 





for every x; and every observation. Also, from the specification that 
is selected at random, we have 

Ele] =0 (19) 
where the subscripts r and s denote different observations. From (8) 


and (19), the expected value of the square of (17) may therefore be 
written 
ay? >, 210? 2? >, 22a" ti? >, 2420" 
“Say oa oe 
Qrit2 >, 212207 Qxiz3 >, 112307 
>> 212 >> 22? r > 21? >> 25° _ 
Qrn1Tk >, Te-1T40" 


® te-1? >, x" 


or, from the orthogonal property of the z’s, equations (6), 











2 


E{y] = af 21 + x2? Penne ll <a. (20) 
} z;* Zz 22" » a; 
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This is the sampling variance of f at a particular observation. Summing 
over all n observations yields 
E[>> ¢?] = ko. (21) 
We next derive the sampling covariance of Y and f. This may be 
defined by the equation 
w[Yf] = E[Yf] — {L(Y ]}tELP]} 
where 7 denotes sampling covariance. In this case, by substituting the 
equivalent values of Y, f, E[Y], and E[f] as given by (1), (16), (3), 
and (7), and by using (17), (5), (2), and (15) to eliminate terms equal 
to zero, we obtain 
w[Yf] = Elys + 6 + &] = Elec]. (22) 
Multiplying equation (17) by e, we obtain 
" ne), ne re >, To2€ re >, Tre 
et = : eee picceh__Daaed 


— Yat Dd 22" Dd x? 


whence, from (8), (19), and (20), 





9 
“ =" 


E[e] { - a Oe. ee eae t E[e?]; (23) 
we = 2 -_ = y C2]: (23) 
. y > 2? > 22? > ri 


that is, at each observation, the sampling covariance of Y and f is the 
same as the sampling variance of f. We have immediately 


E[>> &«] = E[ > ¢?] = ko?. (24) 


From (16), the expected value of the sum of the squared errors in 
f, measured from y, may be written 


E[D f—v*) = FL @-¥+0?] 
E[L (@-vit+ Vistt+2d so —-y)). 
From (21), (17), (15), (18), and (2), this equation becomes 

E( Dd (f — ¥)?] = DL ( — vy)? + ko’, (25) 


which completes the derivation of Theorem (A). 
Expanding the expression for the expected value of the sum of the 
squared residuals, we write 


E|> (f-Y)?]=E[d (f-v—©)?] 
=E[Dd) (f-v)?+ DS &-2)5 e(@—¥) -2> ef]. (26) 
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From (15), (5), and (2), 
E[d «¢ — y)] =0. (27) 


We have, therefore, from the combination of (26) with (25), (8), 
(27), and (24), 


E[D (fF - Y)*)] = Li @ — ¥)? + (n — bo’, (28) 


which completes the derivation of Theorem (B). 

Corollaries (I), (II), and (III) are easily derived from Theorems 
(A) and (B). 

The bias of the regression function, f, at a particular observation is 
defined to be the difference ¢—y. Its mean square for all n observations 
is given by Corollary (II). Some additional properties will now be 
derived. From (15) and (6) we have 

> ¢? - (do ny) 4 (> 22) noch (>> zu) = > oy, (29) 

> x? b X2" pe zi" 


whence > 
— . (i aw)? (2, 2)? 
~@-vr?=Lv ya ar 


2 
(Lea)? (30) 

Do a 
Combining (30) with Theorem (A), and dividing by n, leads to Corollary 
(IV). 

It is to be noted that if a set of orthogonal functions 2, 72, - ++ , Zi, 
defined by equations (5) and (6), has been computed for n observa- 
tions of X,, Xo, ---, Xi, the addition of a new variable X;,; and then 
transforming to a new set of 2, 22, +--+, 2ix1, Satisfying (5) and (6), 
requires the computation of but one new orthogonal function, 2i4:, 
the functions 2, 22, +--+, 2; remaining the same as before. From this 
fact and equation (30), it follows that adding a variable X;,; to an 
original set of variables X:, Xo,---, X; and then recomputing a 
least square linear regression function of the variables can not possibly 
increase the mean square bias, since each term following > in equa- 
tion (30) must be negative or zero. 

A more general statement follows from a comparison of equations 
(13) and (15). Since f and ¢ are seen to be of the same functional form, 
it is obvious that among all functions of the linear form 


© = BiX1 + BX. +--+ + BX: (31) 
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the particular one that minimizes (1/n)>_ (@—y)?, is ¢ as defined by 
(15). In other words, if 


od = Bi’ Xi + Bo’X2 +--+ + Be’ X, (32) 
is a linear function of X;, Xe, ---, Xx, only, such that 
¢’ ~o (33) 


for at least one observation, then 
Li (¢ — ¥)? < Le’ — ¥)*. (34) 
(A linear function with a constant term is to be regarded as the special 
case where X,=1; otherwise, it might be an exception.) 
To derive Corollary (V), we first expand E[>°>(f—Y’)?], obtaining 
E[D (f-Y¥")*)=E[LX (@+5-¥-«')?] 
=E[D) (6-¥)?+ Dott DL ()?+2D $(6-v) 


—2)0 '(¢—-¥)-2D &']. (35) 
From the specifications of ¢’ following equation (10), we write 
Else’) = Ele’) = 0, (36) 
and 
E[(e’)?] = o?. (37) 
Equation (35) therefore reduces to 
E[L f- ¥)*) = LV —¥)* + (mt ko. (38) 
Combining (38) with Theorem (B) yields 
E(d (f -— Y)*] = ELL Uf — Y)*] + 2ko*. (39) 


Dividing by n, and combining with (9), completes the derivation of 
Corollary (V). Combining (III) and (V) leads to Corollary (VI). 


PRACTICAL APPLICATIONS 


The theorems and corollaries in the preceding sections were derived 
for the general case where the relationship between the independent 
variables and the expected value of the dependent variable is not 
specified. This case is broad enough to cover situations where the 
linear regression function neglects some of the independent variables, 
including terms of the second and higher degrees in the variables that 
do appear in this function. More simply, we can always think of the 
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differences between the expected values of the dependent variable and 
the expected values of a linear regression function, when such dif- 
ferences exist, as constituting a set of values of a single neglected vari- 
able. In the following paragraphs, the application of the theorems 
and corollaries to practical regression problems will be considered for 
the case where one or more variables are neglected. 

Suppose several independent variables are being considered, and 
it is desired to investigate various combinations for predicting the 
dependent variable or estimating its expected value. Let m be the num- 
ber of such independent variables (including X,=1 if the regression 
function is to include a constant term, B,), and let n be the number of 
observations. (It is desirable that n—m be large.) We may first com- 
pute the sum of the squared residuals for the least square linear re- 
gression function of all m variables and divide by n—m. Let us desig- 
nate the quotient s*. Then from Corollary (1) we know that the 
expected value of s? is equal to or greater than o”, the sampling variance 
of the dependent variable. In this sense, s? is a conservative estimate 
of o°. 

We see from inequality (34) that the m-variable function has a mean 
square bias less than, or equal to, the mean square bias of any other 
function of the same m variables that is of the same linear form—for 
example, any such function where one or more of the variables are, in 
effect, neglected by setting their coefficients equal to zero. Suppose we 
assume the m-variable function to be unbiased. We can then use 
Corollary (II) to estimate the mean square bias of any other linear 
regression function, of any k variables. We compute the sum of the 
squared residuals for this k-variable function, and subtract (n—k)s? 
where s? is the estimate of o? previously obtained by employing all m 
variables. The remainder, divided by n, is the estimated mean square 
bias. If the quotient should turn out to be negative for a regression 
function which employs no variable not included among the inde- 
pendent variables used in computing s?, it should usually be ascribed 
to chance and not interpreted as evidence that the k-variable function 
has a smaller mean square bias than the m-variable function, as 
inequality (34) shows this latter interpretation to be incorrect under 
the assumed conditions. Note that if the estimate of o* is too large, 
the usual tendency, the mean square bias will tend to be underes- 
timated for all the regression functions, the expected value of the error 
being proportional to (n—k). In many comparisons, however, these 
estimates may nevertheless be indicative of the rank of different 
regression functions with respect to bias. 
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To estimate the mean square error, as measured from the expected 
values of the dependent variable, Corollary (III) suggests that the 
mean square residual be reduced by (n—2k)s?/n, where s* is computed 
as outlined above. The special case where k = }n is interesting, for then 
the mean square error and the mean square residual have the same 
expected value. If the m-variable function used to compute s? is biased, 
however, the estimates of the mean square error will tend to be too 
low for k<4n, and too high for k>4n. For the m-variable function, 
the combination of Corollaries (I) and (III) suggests estimating the 
mean square error by multiplying the mean square residual by 
m/(n—m). As before, the resulting estimate will be unbiased if the 
regression function is unbiased at every observation; otherwise, the 
estimate will tend to be too low for m<4n, and too high for m> $n. 

In many problems of estimating a variable, it is useful to estimate 
the quantity 6 as defined by equation (9). It is the expected value of 
the square of the standard error in estimating a new set of values of the 
dependent variable where these values are assumed to be constructed 
by replacing the random elements in the original observations by new 
random selections from the same universe. In general, 6* is equal to the 
sum of (1) the expected value of the mean square of the differences be- 
tween the estimates and the observed values of the dependent variable, 
and (2) twice the mean sampling covariance of these observed values 
and the estimates.* 

To estimate 6? for a least square linear regression function, Corol- 
laries (I) and (V) suggest that the mean square residual be increased 
by 2ks?/n where s* is computed from the m-variable regression function 
previously defined. In making a choice among several regression func- 
tions, 6? may be estimated for each function, using the same s?, to deter- 
mine the function for which the estimate of 6* is smallest. For the m- 
variable function, we may estimate 6? by multiplying the mean square 
residual by (n+m)/(n—m). Corollary (VI) shows that 6? differs from 
the expected value of the mean square error by the constant o?. It is 
therefore immaterial whether one or the other of these two criteria is 
used in choosing a regression function, provided the same estimate of o° 
is used. In either case, if o? is overestimated, the selection will be biased 
in favor of the regression function with the smaller number of variables. 
However, estimates of 5? have the important property, not possessed 
by estimates of the mean square error, that their expected values are 
always equal to or greater than the quantities that are being estimated, 
for a random selection of regression function under the assumed condi- 


3 It is difficult to think of a suitable name for 4. Perhaps standard residual would do. 
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tions as to the random component of the dependent variable. In this 
sense they are conservative estimates. When the estimates themselves 
are used to determine the choice of function, however, there may be a 
tendency to underestimate 6? for the function chosen. 

There is an interesting relationship between the estimated value of 
§?, as a criterion for deciding which variables to neglect, and the usual 
tests of significance in multiple correlation and the analysis of variance 
where the null hypothesis states that the partial regression coefficient 
of the expected values of the dependent variable on the observed values 
of an independent variable is zero. The 6 criterion rejects the independ- 
ent variable—that is, sets its coefficient in the estimating function 
equal to zero—unless @*, or F, is greater than 2, where ¢ and F refer to 
“Student’s” ¢ and Snedecor’s F, respectively. To make this clear, let 
us suppose that we are given n observations of the dependent variable 
Y and the independent variables X1, X2,---, Xi,--+-,Xi,°++, Xm, 
k<lsm, and that we have computed the linear regression functions 
fe, fr, and f,, from the first k, the first 1, and all m variables, respectively. 
Then if d,? and d/ are estimates of 6* for f, and f;, computed from the 


formulas “an 
s? = 2 (fm — Y)?/(n — m), (43) 
1 
dy? = — [>> (fz — Y)? + 2ks?], (44) 
n 
and 
1 
dou? = — [>> (fe — Y)? + 2ls?], (45) 
n 


and if we compute 
(2G - My -Va-yV/-h 
D (fm — Y)?/(n — m) 


it can easily be shown that d? <d,’ if and only if F >2. The critical value 
of F (or ¢?) corresponds to fairly large probabilities with respect to the 
null hypothesis referred to at the beginning of this paragraph. 

The proper choice of criterion depends, of course, upon the purpose 
in applying it. If it is important not to include an independent variable 
in the estimating function unless there is strong evidence that the par- 
tial correlation of the variable with the expected values of the de- 
pendent variable is different from zero, the ¢ or F test may be applied 
in the search for such evidence. On the other hand, if it is important to 
have unbiased estimates, then it must be kept in mind that a value of 
f or F greater than one supplies some evidence that setting the tested 


F (46) 
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coefficient or coefficients equal to zero will result in biased estimates; 
and that there is no test, based on sampling theory, which supplies 
satisfactory evidence that the “true” partial regression coefficient is 
exactly zero (except in special situations, as when admissible hypoth- 
eses state the coefficient to be zero or one). In deciding which 
variables to neglect, the 6 criterion should not be used in attempting 
to answer the question as to whether the “true” regression coefficient 
is different from zero, but only in trying to find the function that re- 
sults in estimates with the smallest mean square error. 

Corollary (IV) throws light on the problem of choosing a criterion. 
It shows that adding one more variable, X;, to a linear regression func- 
tion reduces the expected value of its mean square error if and only if 


(DU 2h)*/ Do 2? > 0°, (47) 


the left-hand member of this inequality, divided by n, representing the 
reduction in the mean square bias, and the right-hand member, di- 
vided by n, representing the increase in the mean sampling variance. 
In testing a single variable, X;, the objective in using the 6 criterion 
is to retain this variable in the regression function when inequality 
(47) is true, and to neglect X; when (47) is false. An underlying premise 
is that the risk of accepting (47) when it is false is of about the same 
importance as the risk of rejecting (47) when it is true. On the other 
hand, the objective in using the F or ¢ test is to neglect the variable 
X; in the regression function when 


# zw)?/ >> z;? = ps ry = 0, (48) 


and to retain the variable otherwise. The common use of this test with 
a small probability as the basis for making decisions rests on the 
premise that the risk of rejecting (48) when it is true is very important, 
while the risk of accepting (48) when it is false is comparatively unim- 
portant. On account of the difference in the relative importance at- 
tached to the two kinds of risks, the F test sometimes leads to the non- 
rejection of (48) where the 6 criterion leads to the acceptance of (47). 
As usually employed, the F test frequently tells us to retain a variable 
in the regression function, but never tells us to neglect one. The & 
criterion does both, the decision to retain a variable depending on 
whether the gain in reducing the mean square bias appears to outweigh 
the increase in the mean sampling variance. 

In fitting empirical functions to a set of observations there is no 
assurance, of course, that 6? or any other criterion will be satisfactory 
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for unobserved points outside or inside the region of the observations, 
or even for recurrences of the observed combinations of values of the 
independent variables if some variable not considered affects the de- 
pendent variable. There are also problems where our assumptions as to 
the random component will not be sufficiently realistic. Only practical 
experience can determine which assumptions and what criteria are 
most useful for any given type of problem in a particular field of re- 
search. 

Linear restrictions. In computing partial regression coefficients By, 
Bz, +--+, Bm, it is sometimes desirable to impose the condition that 
the computed values satisfy c consistent independent equations of the 


form 

ainBy a a2Be + : + AimBm = Te (i = 1, 2, ee? c), (49) 
where the a’s and y’s are specified constants. A special case is that 
where these equations are all of the simple form 


B; = 0. (50) 


Imposing the condition that the computed values of the B’s satisfy c 
equations of this simple form is equivalent, of course, to neglecting c 
independent variables in computing the regression function. 

In general, it can be shown‘ that requiring the solution for the re- 
gression coefficients to satisfy c equations of form (49) is equivalent to 
making a linear transformation of the variables Y, Xi, Xo2,---, Xm 
and then neglecting c transformed independent variables. It can also be 
shown that if, for every observation, equations (2) and (3) are true for 
y, ¢, and every X; in terms of the original variables, they are also true 
in terms of the transformed variables, o? and each ¢ remaining un- 
changed. Hence, if the transformed variables can be orthogonalized, 
all our theorems and corollaries hold, provided k be replaced with m—c. 

4 See Wilks, S. S., Mathematical Statistics, 1943, pp. 171-172. There it is specified that the random 
component in the dependent variable is normally distributed about a linear function of the independent 


variables. The discussion of the transformation is sufficiently general, however, to cover the case con- 
sidered in this article where neither the functional relationship between the variables nor the form of the 
distribution of the random component is specified. 

Incidentally, this discussion settles a question raised elsewhere by the present writer as to whether 
a least square solution results in the unbiased estimate with minimum sampling variance when the solu- 
tion is subject to a linear restriction on the parameters. See reference in footnote 3 supra, p. 459. 








THE ANALYSIS OF LATIN SQUARES WHEN SOME 
OBSERVATIONS ARE MISSING* 


D. B. De Luryr 
Virginia Polytechnic Institute 


The discussion of the missing-value problem is given ex- 
plicitly for a biological assay which employs a 4X4 latin 
square in several replications. However, the methods are 
easily adapted to any latin square and to various other de- 
signs as well. 

Methods of analysis, when some observations are missing, 
are discussed for the following cases. 

(1) One or more “single” observations are missing. 

(2) Several columns are missing. 

(3) One column is missing. 

(4) Two columns are missing. 

(5) One column and one or more single observations are 

missing. 

NTRODUCTION. Devices which can be used to simplify the statistical 
promt when some of the observations are missing from a latin 
square have been discussed by Yates [1] and by Yates and Hale [2]. 
These discussions cover most of the cases of practical importance, but 
in certain situations, for example, when a latin square of fixed size is 
in routine use, the methods and results given in these papers can be 
reorganized so as to increase considerably the simplicity of the formulae 
and of the arithmetic required for their application. The latin square 
arrangement suggested by Bliss and Marks [4] for use in the assay of 
insulin illustrates such a situation. This paper discusses the analysis, 
when some observations are missing, of this specific design, which is 
described in section 1. Modifications appropriate to other designs should 
be fairly obvious. 

1. This assay is designed to compare the potencies of two prepara- 
tions, which may conveniently be called the “standard” and the “un- 
known.” Each preparation is administered at two dosage levels, which 
usually are made the same for both preparations under the assumption 
that the potencies are equal. Each experimental animal is given each of 
the four doses on four different days. The observations on four animals 
are arranged in a latin square, “rows” corresponding to days, “columns” 
to animals, and “treatments” to doses. This latin square is repeated 

* A paper presented at the 105th Annual Meeting of the American Statistical Association, to 8 


joint session of the Biometrics Section and the Institute of Mathematical Statistics, Cleveland, Janu- 
ary 25, 1946. 
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several times, using different sets of animals, each set with its own ran- 
dom arrangement of doses, but administering the doses on the same 
days. Thus if r replications are used, the assay involves 4 r animals and 
yields 16 r observations. 

Assuming that the effects of rows, columns and treatments combine 
additively, the analysis of variance table reads as follows when no ob- 
servation is missing. 


Sources of variation Degrees of freedom 
columns (animals) 4r-1 
rows (days) 3 
treatments (doses) 3 
residual (error) 12r-6 


The three degrees of freedom exhibiting differences among treatments 
are subdivided into 


1 for the distance between the two dose-response lines, (D) 

1 for the common slope of the two lines, (B) 

1 for the departure from linearity of the dose-response curve over 
the range of the test (C). 


Since the response is linearly related to the logarithm of the dose, the 
difference between the logarithms of the potencies of the two prepara- 
tions is given by IJ D/B, where I represents the difference between the 
logarithms of the two doses. ({[4] page 188.) Validity of the assay re- 
quires that B differ significantly from zero and that C be not signifi- 
cantly different from zero. 

These calculations may be illustrated on the data of Table I, which 
gives the observations, in milligrams of glucose per 100 cc. of blood, 
obtained in a rabbit assay of insulin. The latin square arrangement, 
with 4 replications, was used. The treatments, assuming the same po- 
tency for unknown and standard, were: 


treatment 1—0.30 units of the unknown preparation, 


” 2—0.60 units of the unknown preparation, 
. 3—0.30 units of the standard preparation, 
, 4—0.60 units of the standard preparation. 


The number of the treatment is recorded under each observation. 

The analysis of variance table is made up using the ordinary rules for 
such computations. The sum of squares arising from differences among 
columns is given by [(210)?+(246)?+ - - - +(288)?]/4—(3782)?/64, 
and similar calculations furnish the sums of squares for rows and treat- 
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ments. The residual sum of squares is obtained by subtracting from 
the “total” sum of squares, those for columns, rows, and treatments. 


d.f. 8.8. m.s. 
columns 15 10 ,327 
rows 3 937 
treatments 3 3,486 
residual 42 2,589 61.65 
total 63 17 ,339 


No importance is attached to testing for significance the treatment 
sum of squares, but since the validity of the assay depends on the 
significance or non-significance of the quantities B and C, these must 
be tested. It is convenient to calculate B, C and D according to the 
following scheme. ([4], page 184.) 


treatment number l 2 3 4 

treatment total 1078 845 1047 £812. divisors 
D +1 +1 —-1 -1l 4/r=8 + 8.00 
B —-1 +1 —-1 +1 4/r=8 —58.50 
4 -1 +1 +1 -1 4/r=8 + 0.25 


This array implies that D = [(1078)(+1)+(854)(+1)+(1047)(—1)+ 
(812)(—1) ]/8=8.00, ete. 

The value of B may be tested either by calculating t= | B| /s, where 
s* is the residual mean square, and entering the t-table with 42 d.f., or 
by calculating F = B?/s* and entering the F-table with n;=1, n2=42. 
Here the value of B is highly significant (t=7.45, P<10-*). The value 
of C may be tested in exactly the same way. In this case it is obvious 
that C does not differ significantly from zero. 

Since these tests give no reason for doubting the validity of the as- 
sumptions underlying the analysis of the assay, the logarithm of the 
ratio of the potencies may be computed as 





M = (0.30103) (8.00)/(— 58.50) = —.0412, 
with an estimated standard error given by 
sI\/ B? + D? 
su = — pa = 0.0820. ([4], formula (4).) 


2. The simple analysis outlined above breaks down when an obser- 
vation is missing and constants must be fitted to separate out the con- 
tributions of rows, columns and treatments. In order to set up the 
problem in symbols, let y;;, represent the observation in row 7 and 
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column j, where k=k (i, 7) denotes the treatment in row 7 and column 7 
provided by the latin square arrangement. Under the same assumptions 
that were made in the analysis given above, the equation representing 
the dependence of y on its row, column and treatment may be written 


(2.1) Y siz =m+rit C; + tk, with b rT; = } cj = » > i = 0. 


The 7’s, c’s and ?’s represent estimates of the deviations, from the 
general mean, of the average contributions to the observed values of 
rows, columns, and treatments. m is the estimate of the general mean. 
Y;;. is the estimate of the mean response to treatment k in row 7 and 
column j. In order to bring it into conformity with the standard form 
of regression equation, this equation may be written 


(2.2) Yin = m+ D Tada + Pa Cs08; + ) > ty Ort, 
a B 7 


where 6,,=1 or 0 according as p=q or pq, and the summations extend 
over all rows, columns and treatments remaining in the assay. (This 
form of the equation is needed in section 5.) The 6’s are the independent 
variables and the r’s, c’s, ¢’s, and m are regression coefficients, chosen 
to minimize S(y—Y)?, where S denotes summation over all the obser- 
vations. The minimizing of this sum, subject to the restrictions }-r;=0 
etc., may be carried out by introducing Lagrange multipliers and find- 
ing the unrestricted minimum of 


vs) = S(y ~- Y)? +2>5 T; +2u>> Cj +2Qv>> te. 


The derivatives of ¢ with respect to m, the r’s, c’s and ?’s, and X, » and 
v, equated to zero, provide a set of normal equations which determine 
the values of the regression coefficients. 

When no observation is missing these equations are very simple. Using 


G to denote the sum of all the observations, 


2; # « «“ “ “« « & w in row 2 
C; « « « «€ & & & « in column j, 
T, “ rr c «€ & & & « on treatment k, 


the normal equations are 


l6rm = G, 
4rm + 4rr7; +X = Ri, ¢=212 3, 4, 
(2.3) ; 
4m + 4c; + wp = Cj, j =1,2,-+-4r, 
4rm + 4ru + v = Ti, k = 1, 2, 3, 4. 
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The solutions follow easily. 
m = G/16r, r; = R;/4r — G/16r, c; = C,/4 — G/16r, 
t = T;,/4r — G/16r, A=p=p = 0. 


(2.4) 


The residual sum of squares, obtained by applying a standard least- 
squares formula, is given by 


(2.5) S(y — Y)? = Sy? — mG — Do r.Ri — Doe; — Y tT. 


Now Sy?— mG = Sy’?—G?/16r is the “total” sum of squares of the analy- 
sis of variance, and >-7,R;= >_R,(R./4r—G/16r) = (>_R?2) /4r —G?/16r) 
is the sum of squares properly ascribed to variation among rows, undis- 
turbed by differences among columns and among treatments. Similarly, 


7 eC; 
>» aT: 


These sums of squares are the same as those entered in the analysis of 
variance table of section 1. It is to be remarked also that the combina- 
tions of observations designated B, C, D, are linear functions of 
th, te, ts, ts. For example, B= 1/r(—tit+te—ts+t). 

3. When some observations are missing, the setting-up and solving 
of the normal equations are in general very laborious. However, in some 
cases, the equations are simple or can be made so by employing suitable 
devices; in others, the normal equations may be avoided entirely by 
using “missing-plot” formulae. The purpose of this paper is to explore 
these two situations, providing appropriate formulae when possible 
and illustrating devices which may be used when a solution by means 
of formulae is not practicable. No particular novelty is claimed for 
most of this discussion. It represents simply a collection of methods, 
most of them well known, for application to a specific problem. 

The cases to be considered are the following: 

(1) one or more single observations are missing; 

(2) several columns are missing; 

(3) one column is missing; 

(4) two columns are missing; 

(5) one column and one or more single observations are missing. 
Cases (1), (3) and (5) may be treated by means of formulae; cases (2) 
and (4) require the solution of a set of normal equations. The more 
general situation in which several columns and one or more single ob- 
servations are missing is not discussed, since no simple formulae are 
available in such cases, 


(>> C;?)/4 — G?/16r is the “column” sum of squares, 
(>> T;2)/4r — G?/16r is the “treatment” sum of squares. 
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4, Case (1). When one or more single observations are missing, the 
solving of normal equations may be avoided by applying the argument 
given by Yates [3], which, in brief, runs as follows. Suppose that the 
normal equations have been solved for the values of m, the r’s, c’s and 
t’s. Then Y;;,=m+r;+c;+¢, is determined so that S(y—Y)? is a mini- 
mum, subject of course to the restrictions >>r;=0, etc. Suppose also 
that the Y-values corresponding to the missing observations are used 
to fill in the gaps in the table of observations. Now let Yjg =m’ +1r';+c; 
+t’, be a regression equation fitted to the numbers in this completed 
table. This amounts to choosing Y’ to minimize S(y— Y’)?+>_(Y — Y’)?, 
where S represents, as before, a summation over the actual observa- 
tions and >> here indicates summation over the values which were 
filled in from the regression equation. It is clear that this sum is 
minimized by putting Y’=Y, that is, the regression equation fitted 
to the completed table is identical with that fitted to the actual obser- 
vations alone. The fitting to the completed set of observations is im- 
mediate, since the solutions obtained in section 2 now apply. It follows 
that when the analysis shown in section 1 is run on the completed table 
of observations, the error term and the components B, C, D are strictly 
correct. Hence if a way can be found to determine the Y-values to sub- 
stitute for the missing observations, without first fitting the regression 
equation, a simple and direct method of analysis is provided. This is 
easily done. 

Suppose that a single observation is missing, say in row / and column 
m. Let the symbol yim, be written for the number which ultimately will 
be substituted in this position. When a regression equation 
Yi,,=m+r,;+c;+t, is fitted to this “completed” table, the coefficients, 
m=G/16r, r;= R;/4r —G/16r, etc. (from section 2) involve the symbol 
Yinn. Substitution for m, r;, c;, t, in the regression equation yields the 


relation 
Yin = R,/4r + C,/4 + T,/4r — 2G/16r. 


Now the value of Yim, is obtained by equating Yima to Yim. This forms 
a linear equation in yim, Whose solution is 


4rC’n + 4(R’; + T1,) — 2G’ 
12, — 6 





(4.1) Yinmn = 


The primes on the symbols indicate that these totals, all of which in- 
volve the cell of the table from which the observation is missing, are 
sums only of the actual observations. Thus C,,, for example, represents 
the sum of the three observations in the column from which the obser- 
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vation is missing. When r=1, this formula reduces to that given by 
Yates ((3], page 132) for a single 44 latin square. 

Formulae for obtaining simultaneously values to replace several 
missing observations may be derived in the same manner. However it 
is simpler to use an iterative process based on the single-value for- 
mula. This procedure is described in a number of places ({3] page 133, 
[5] page 263). 

The sum of squares of residuals may be computed according to for- 
mula (2.5), that is, according to the rules used when no observation is 
missing, with the totals made up to include the numbers substituted 
for the missing values. The number of degrees of freedom associated 
with this sum of squares is 12r—6 less the number of missing observa- 
tions. 

Thesum }-r,R, cannot now be identified with variation arising solely 
from differences among rows, owing to correlations among the row, 
column and treatment constants. Likewise, the sums > c,C; and 
>-t.T, do aot represent variation among columns only and among 
treatments only. However, the effects of these correlations are small 
enough to be ignored, unless the number of missing observations is 
large ([3] page 138). The effect of a missing observation on the variances 
of B, C, and D is discussed briefly in section 8. 

5. Case (2). When the missing observations make up complete col- 
umns it is, in general, necessary to solve a set of normal equations and 
the only question requiring discussion concerns the form in which this 
calculation is best carried out. Normal equations may be set up and 
solved for the constants rj, c;, tg as in section 2 and the combinations 
of the t’s which give the values of B, C, D may be calculated. However, 
it seems simpler to write the regression equation in such a form as to 
yield these combinations directly. This may be done by rearranging 
equation (2.2) into the identically equivalent form 


3 q-l1 3 
(5.1) Y siz => m + » Ta Uaiit + Ie Ca* 3: jk + > ty * Weisz; 


a=1 B=1 y=1 
where 
Uris = O19 + b29 — 535 — O43 Wijk = Siz + bx — 53% — Sax 
Urijx = — bi + bax — 835 + O45 Wisk = — Sie + dex — Sse + Sax 
Usisk = — 51x + bas + 8315 — Sa: Wsisk = — Su + dx + be — Sue 


vsik = 815 + b2n5 + +++ + 83; — Bdeiy;, B= 1,2,---, (9-1), 
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and 

r* = (ry + m2 — 73 — 7%) /4 ti* = (44 + b — ts — &)/4 

n* =(—n+rm—m+7)/4 * = (—t +t — t+ %)/4 
rs* = (— 1 +72 +73 — 1)/4 ts * = (—h ++; — &)/4 
ca* = (Cr +2 +--+ +s — Besy:)/B(B+1), = 1,2,---,(q@— 1). 


The symbol q is used in these formulae to denote the number of col- 
umns (animals) remaining in the assay. There are no restraints on the 
values of the regression coefficients when the equation is written in this 
form. 

The w’s and ¢*’s have been chosen to make 4,*, t,*, ts* proportional to 
D, B, C. The choice of the u’s and r*’s is based chiefly on considerations 
of symmetry, since the values of the row constants are of no direct 
interest in this example. Likewise the column constants play no direct 
part in the evaluation of the assay and may be chosen in any way con- 
sidered suitable. In this particular case, no simplicity is gained by re- 
placing the original independent variables (the 63;) and column con- 
stants (the c;) by orthogonal combinations of them (such as the v’s 
and c*’s). However, this change of variables may well prove useful, in 
other situations of this kind, in simplifying a system of normal equa- 
tions. Other orthogonal functions may, of course, be used, according 
to the needs of the particular situation. 

Assuming that no observation is missing, apart from the missing 
columns, all sums and sums of products of the independent variables 
(the u’s, v’s and w’s) vanish except those of form S(taijzx Wyije), 
a, y=1, 2,3. These sums may be written 


4 q 4 
> Uaist >, Wriik = p QiyU aijk; 


t=1 j=1 t=] 


by letting a;, represent the value of Df. Wysje Thus a;, is simply the 
sum of the values taken by w,,;, over row t. These numbers a;, are 
always zero for the intact design, but if some columns are missing, their 
values depend on the arrangement of treatments in the missing columns. 
These values are always small positive or negative integers and are 
easily calculated. In all cases >-* ,a;,=0, y=1, 2, 3, which provides 
a useful check on this calculation. When the values of the a;, have been 
calculated, the sums >- 4 ,a;,uai3z=Aay (say) may be evaluated. The 
other coefficients needed for the normal equations are 
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Swain = Sw*yin = 4q, Svein = 48(8 + 1), B = 1,2,---, (@— 1). 


The normal equations are 


4qm = G, 
48(8 + 1)cs* = Cs*, 8 =1,2,---,(q—-1) 
4qr,* + Anti* + Arl,* + Arsts* = R,* 
4qr2* + Anyt)* + Adste* + Aosts3* = R.* 
dors* + Ayti* + Aste* + Assts* = R3* 
Ayni* + Aare* + Agirs* + 4qt,* = T',* 
Ayr1* + Asore* + Azgors* + 4qt.* = 7," 
Aj3r* + Aogre*® + Asgsr3* + 4qi;* = 7T;* 
where C,*, R,*, - - - 73;* stand for the following combinations of the ob- 
servations. 
Ce* = Ci + Cot-+- + Cs — BCaxs, B=1,2,---,@-1}) 
R,* = Rk, + R. — Rs — Rg T,* =™+7.-—T73;-—T7, 
R* = —Rk, +R: —R3 + Ry T** = —™+72-—Ts3+ 7, 
R;* = —R, +R. + Rs — Ry T3* = —™+72+T73 — 1s. 


The residual sum of squares is calculated from the formula 


q-1 
S(y — Y)? = [Sy? — mG] - ! 2. eta" | 
B=1 


3 3 
— | > ra*R.* + >. | 
a=l y=1 

The term in the first square bracket is the “total” sum of squares of 
deviations about the mean, usually calculated according to the formula 
Sy?—G?/4q; the second term is the sum of squares attributable to 
variation among columns and is best calculated from the equivalent 
formula 25. iC ?/4—G*/4q; the third term derives its value from varia- 
tion due to “rows and treatments.” The sums )-r.*R,* and > t,*7,* 
cannot be regarded separately as exhibiting variation among rows and 
among treatments. When such quantities are needed they may be 
obtained without difficulty, but they are not required here and the 
question will not be discussed. The only quantities needed for the analy- 
sis of the assay are S(y—Y)?*, the numbers ¢,*, ¢,*, ¢t3*, and certain ele- 
ments of the inverse matrix. 
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If the covariance of t,* and t,,* is denoted o°’c,,-, then c,,, is one of 
the elements of the inverse matrix. Its value may be obtained by put- 
ting unity for 7,-* and zeros for all other R*’s and 7*’s in the expression 
for ¢*,, obtained from the solution of the normal equations. This rule 
applies also when ~=v7’, in which case o°c,, is the variance of t,*. The 
value of o? is estimated by s?=S(y— Y)?/(3q—6). 

Any one of the /*’s may be tested for significant departure from zero, 
either by calculating t= (t,*/s./c,,) with 3g—6 degrees of freedom, or 
F= with n,=1, ne=3q—6. 

The estimate of the logarithm of the ratio of the potencies is 





(5.3) M = It,*/t:* 
with an estimated standard error given by 
: Is 
(5.4) su = ry V Cule*? — Qcyte*ti* + coolr*?, 


which is simply the form taken in this case by the first order approxi- 
mation to the standard error of a ratio. 

The computation of the numbers A., from their definition is not dif- 
ficult and does not require an inordinate amount of time. However, this 
calculation may be simplified and systematized considerably by regard- 
ing the matrix (A+) as the product of two matrices as follows (shorten- 
ing the symbol waijx tO Was): 


4, Aye As 


(Ui Uy Ug Ung (An Ap Axs| 


4 
A 21 Ax A *| . 
{ 


| 
Qa, 22 23 

Uo, Ue2 Ueg Ung = 
dz, «32S 33 


Uz31 Ugo Uszg Use An Ag As; 


Aq, Aga Az 


The u-matrix is determined by the choice of the functions wa;;, and is 
fixed when this choice has been made. For the u’s chosen in this paper, 
the u-matrix is 
( 1 1 -l -1l 
—l 1 -l 1 
—1 1 1 —-1 
The a-matrix depends on the form of the functions w,,;, and on the 
arrangement of treatments in the missing columns. When this matrix 


is determined for a given case, the A-matrix is calculated by matrix 
multiplication. 
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The solution of the normal equations is in general fairly laborious, 
but in the cases which are likely to occur most frequently the solution 
is simple or can be avoided altogether. A discussion of these cases fol- 
lows. 

6. Case (3). When a single column is missing, no essential restriction 
is imposed by the assumption that treament 1 is missing from row 1, 
treatment 2 from row 2, etc., since this can always be arranged by re- 
numbering the rows. The A-matrix for this order is 


—4 0 0 
0 —4 0 
0 0 —4 


The solutions of the resulting normal equations are 


(6.1) Ta" - (qR.* + T.*)/4(q’ os 1), t,* = (qT,* + R,*)/4(¢? ams 1) 
a, y = 1, 2, 3, 


and the elements c,,’ of the inverse matrix are 
(6.11) Cu = C2 = C33 = g/4(q? — 1), Cio = C3 = Cn = 0. 


Substitution of these values for the r*’s and ¢*’s in )-r.* R,* + > t,*T,* 
yields the formula 


(Lato r")/sa + v 


(6.2) P 
+ ¥ (Ra* + T.*)*/4(q? — 1) 


a=l 


for the reduction in the sum of squares due to rows and treatments. 
Since this formula does not require the values of the r*’s, these num- 
bers would not, ordinarily be computed. 
It may be remarked that the sum of squares due to treatments alone 


eatiiccial — 4(q°—1) 
(eliminating rows) is given by ———— )> t,*?, since the ¢*’s are mu- 
q 


tually uncorrelated. Likewise the sum of squares due to rows (elimi- 
4(q?—1) 


nating treatments) is ———— po r.**. These facts are not needed for 


the analysis of the assay, however. 
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The formulae deduced in this section may be applied whenever only 
one column is missing. In this way the solution of normal equations is 
avoided and indeed the analysis is no more difficult and requires only 
slightly more computation than that of the intact design. 

7. Case (4). When two columns are missing the solving of the normal 
equations cannot conveniently be avoided, but the determination of 
the A-matrix can be made very simple, and the equations are not par- 
ticularly difficult to solve. 


Let the symbol 
* (pr, P2) = 


(M1, q2) 
(ri, 12) 


L_ (81, S2) al 








associated with a design in which two columns are missing, be defined 
to mean: treatments 7; and p2 are missing from row 1, treatments q 
and 2 are missing from row 2, etc. It is clear that the values of ay, ai, 
a;3, that is, the members of the ith row of the a-matrix, depend only on 
the treatments missing from row 7 of the design. There are only ten dif- 
ferent pairs of treatments which may be absent from any row, and there- 
fore there are only ten different rows which may occur in the a-matrix. 
No advantage is gained here by reordering rows to bring the missing 
treatments in one of the columns into an assigned order since the num- 
ber of different pairs is the same in any case. These ten pairs, together 
with the corresponding rows of the a-matrix, are listed below. 


(1, 1) —2 +2 +2 
(1, 2) —2 0 0 
(1, 3) 0 +2 0 
(1, 4) 0 0 +2 
(2, 2) —2 —2 —2 
(7.1) 
(2, 3) 0 0 —2 
(2, 4) 0 —2 0 
(3, 3) +2 +2 —2 
(3, 4) +2 0 0 
(4, 4) +2 —2 +2 


The setting-up of the a-matrix will consist simply in writing down the 
symbol associated with the design and substituting for each curved 
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bracket (1, p2) etc., the corresponding row taken from this list. The use 
of this device is illustrated in section 9. 

Similar lists can be made up for obtaining the a-matrix when more 
than two columns are missing, but the lists are long and presumably 
would be used only rarely. 

8. Case (5). Suppose now that one column and another single obser- 
vation are missing. This case can be treated by a combination of the 
methods used in sections 4 and 6. After rows have been reordered to 
bring treatment 1 in the missing column into row 1, etc., let the single 
missing observation occur in row land column m. Substitute for this 
missing observation the symbol yim, and fit to the “completed” set of 
observations a regression equation Y,;,=m+r;+c;+t,. The values of 
the fitted constants are easily determined and, when substituted in the 
regression equation, yield the relation 


Yin = C;/4 + [Q(Ri + Tr) + Ri + T)/(Q? — 1) — G/2(q — DV. 


Now, according to the argument used in section 4, the number to be 
substituted for the missing observation is determined by equating 
Yimn tO Yimn. Thus the formula for the missing value is obtained by solv- 
ing for Yimn the equation 


Yinn = C,,/4 + [q(R, + Tn) + R,, + T:\/(q? — 1) _ G/2(q ried 1). 


Two cases arise. If n=l, all terms on the right side of this equation 
involve Yimn; if n#l, the terms FR, and 7; do not contain Yimn. The solu- 
tions in the two cases are 


(8.1) yam: = [(¢ — 1) Cm’ + 4(Ri’ + Ti’) — 2G']/(3q — 9), 





(8.11) cu = Co = C3 = aol + i”), 
4(q? — 1) 3q —9 
and 
(a2) Yin [(q? — 1)Cm’ + 49(R,’ + T,’) 
+ 4(R, + T:) — 2(q + 1)G’]/(3q? — 6g — 1), 
(8.21) cu = Co = C33 = wool! + 39 - ); approximately. 


The prime indicates, as in section 4, that the total is incomplete. The 
derivations of formulae (8.11) and (8.21) are discussed later. 

The argument given in section 4 shows that when this number is 
substituted for the missing observation and the analysis carried out 










































SIATION 


lhe use 


n more 
imably 


-obser- 
of the 
red to 
single 
or this 
set of 
ues of 
in the 


1). 


to be 
lating 
’ solv- 


- 1). 


lation 
solu- 


= 1), 
ly. 


The 


er is 

















MISSING OBSERVATIONS IN LATIN SQUARES ANALYSIS 383 


using the formulae of section 6, the values of the fitted constants and 
of the residual sum of squares are identically the same as would be ob- 
tained by fitting the regression equation to the original observations 
alone. However, the formulae given in section 6 for the variances and 
covariances do not hold here. The following considerations yield the 
proper values without much computation. 

Suppose that the regression equation (5.1) is fitted to the original ob- 
servations. The left sides of the normal equations will be considerably 
different from those of the equations in section 6, but the right sides, if 
changed at all, will be altered only by the addition or subtraction of the 
number Ymn- Again using a prime to indicate that yim, is omitted, the right 
sides of these normal equations may be written G’, C*’, R*’, etc., where 
G=G'+Yimn, Ri*=Ri*’ +Yimn, the plus sign to be used if 1=1 or 2 and 
the minus sign if 1=3 or 4, with similar relations between the other 
primed and unprimed symbols. Now, to find the variance of one of the 
t*’s, say t.*, we may replace 7'.*’ by unity and all other 7*’’s, R*’s etc. 
by zeros in the formula for ¢,* given by the solution of the normal equa- 
tions. This substitution gives the value of c22. Similarly, replacing 7',*’ 
by unity and all other 7T*’’s etc. by zeros, we get the value of ¢i2. 

The formulae for the ¢*’s are given in section 5 expressed in terms of 
the unprimed symbols. These formulae can be written in terms of the 
primed symbols (remembering that yin, is itself a function of the 
primed symbols), and thus, making the substitutions indicated above, 
are obtained the values of the c’s, 

The formulae for the c’s do not fall into any simple pattern, but there 
are two main types corresponding to the two missing-value formulae. 

When formula (8.1) is used, that is, when a treatment is missing twice 
in the same row, 


a) a er 
Cu = Co = C33 = ———- , 
ll 22 33 4(q@? — 1) q 3q — 9 





and Ci2, C23, C3, are equal to +1/12(¢—1)(q—3). The arrangement of 
plus and minus signs depends on the whole configuration of missing 
treatments, but in all cases two of the covariances are positive and one 
is negative. 

When formula (8.2) is used, the values of cy, C22, C33 are given by 


1 [a+ (q + 1)? | 
a@?—-DL° 3q? — 6g — 1) 


the plus sign applying to only one of the three, and the set of 
quantities Ci2, C23, Cs, may take various combinations of the values 
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+1/4(3q*—6q—1), +(q—1)/4(¢+1)(8¢?—6¢—1). Thus it appears that 
the variances and covariances require a rather complicated set of for- 
mulae. The complexity may be avoided, however, by introducing suita- 
ble approximations. For any values of g that are likely to occur in 
practice (q¢+1)*/(3g?—6q—1) may be replaced by q/(3q—6) without 
sensible loss of accuracy, which gives the approximate formula (8.21). 
Likewise, in such cases, the covariances are numerically small and may 
be ignored. 

An argument of the same kind, applied to the case discussed in sec- 
tion 4, shows that the effects of a single missing observation are to in- 
crease the variances of B, C, D from o* to o?(12r—5)/(12r—6) and to 
change the covariances from zero to +o?/(12r—6). These changes are 
seen to be so small numerically as to permit the use of the formulae 
appropriate to the intact design, ignoring the slight inaccuracies result- 
ing from the missing observation. 

When one column and several single observations are missing, for- 
mulae (8.1) and (8.2) may be used iteratively to supply values for the 
single observations. An analysis performed on this “completed” set of 
observations, using formulae (6.1) and (6.2), yields the correct values 
of the residual sum of squares and of ¢,*, t.*, ts*. The variances and co- 
variances of the ¢*’s are affected by the missing observations, but the error 
committed in using formulae (8.11) and (8.21) in such cases should be 
small, unless the number of missing values is large. 

9. Examples. Case (1). Suppose that the observation in row 1 and 
column 16 of Table I is missing. Then R,’ =872, Ci.’ =224, T2'=781, 
G’ =3718. These numbers, substituted in formula (4.1), give the num- 
ber to be used for the missing observation. 


Yrasye = [16(224) + 4(872 + 781) — 2(3718)]/42 = 66, 
to the nearest integer. 


The analysis of the completed table, carried out as in section 1, gives 
the following values for the sums of squares and the components B, C, D. 





d.f. S.S. m.s. 
rows 3 935 
columns 15 10,380 
treatments 3 3,461 
residual 41 2,587 63.11 
total 62 17 ,363 
D = 8.25, B= — 58.25, C = 0.50 
M = (.30103)(8.25)/(— 58.25) = — .0426 
sm = (63.11)(.30103)+/ (58.25)? + (8.25)2/(58.25)? = .0329. 
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matrix. 
f jl 
—1 
—1 


R, = 763 
R: = 742 
R; = 747 
R, = 849 


identities 







-1 
—1l 
1 


—l 


1 


—] 


T, = 866 
T: = 693 
T; = 862 
Ts = 680 


Ri* + R.* + R* + G = 4R,, 


1 
—1 


—1l 


1 


R,* 
R.* 
R;* 


—|] 
aa | 


1 
1 


— 91 


= +81 


— 123 


0 
0 
—4 


T;* 
T:* 
T;* 


The right sides of the normal equations (the R*’s and 7*’s) are com- 
puted from the row and treatment totals. 


The calculation of the R*’s and 7T*’s may be checked by means of the 


—4 
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Here the value of M is correct, but the formula and value of sy are in- 
accurate, owing to the correlation of B and D and the altered values of 
their variances. Correcting for both of these sources of error changes the 
value of sw to .0335, a negligible correction, particularly in view of the 
fact that the formula for sy is itself only approximate. Similarly, cor- 
rections in the formulae for testing the significance of B and C are not 
important and may be ignored. | 
Case (2). Suppose that columns 14, 15 and 16 are missing. The first | 
step in setting up the normal equations is the computation of the a- 
matrix. The value of a23, for example, is the sum of the values taken by 
ws for the 13 observations in row 2. Now w; takes the value 1 when the 
observation is made on treatments 2 or 3, and takes the value —1 when 
the observation is made on treatments 1 or 4. The treatments missing 
from row 2 are treatments 2, 4, 3. Hence a23+1—1+1=0, a2;=—1, 
since each w sums to zero over each row when the design is intact. In 
this way (or others) the a-matrix may be set up in a few minutes. It is 
given below, multiplied on the left by the u-matrix to form the A- 


0 
0 


+17 
— 355 
+9 








Ti* + 7.*+73* + G = 4T, 


672¢;* 





When the normal equations are set up, it is found that the six equa- 
tions break up into three pairs. The solutions are 


672t,* => 137;* = R.* 672%.* = 137.* aa R;* 
672r,* = 13R,* or T;* 672r;* = 13R;* + T:* 6727,* 


137;* + R,* 
13R,* + T3* 
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Substitution of the numerical values of the R*’s and T*’s yields 


i,* = 0.2083 m* = — 1.7470 
* = — 7.0506 r* = 1.5417 
t;* = 0.0387 r3* = — 2.9077 


Substitution of unity for 7;* and zeros for all other 7*’s and R*’s in 
the formula for ¢;* gives cy = 13/672. Similarly, replacing TZ by unity 
and all other 7*’s and R*’s by zeros in the same formula gives the value 
of C12. Evidently, 

Cu = Cee = C33 = 13/672, Cio = Cog = Cy, = O. 


The reduction in the sum of squares due to rows and treatments is 
> ra*Ra* + Dd ty*T,* = 3148. 


The other numbers needed to complete the analysis of variance 
table are: the “total” sum of squares, (59)?+ - - - +(71)?—(8101)?/52 
= 198533 — 184926.94 = 13606; the “columns” sum of squares, 
[(210)?+ - - - +(222)?]/4—(3101)?/52 = 8847. 





d.f S.S m.s 
columns 12 8,847 
rows and treatments 6 3,148 
residual 33 1,611 48.80 
total 51 13 ,606 
i,* is highly significant, since 
(t2)*? 
F= = 52.65. 
$*C20 
Likewise ¢f is not significant. 
M = — .0889 (Formula (5.3)) 
Su = .0411 (Formula (5.4)) 


Case (3). Suppose that column 16 is missing. Reordering rows to 
bring treatment 1 of column 16 into row 1, etc., in order to apply the 
formulae of section 6, the row and treatment totals and the numbers 


calculated from them are: 
R,=839 7,=1004 Ri*=-72 T,*=4+76 R*+T7i*=+4 
R.=872 T2=781 R.*=+210 T.*=—442 R,*+T7,* =—232 
R;=803 7;=964 R,*=—-144 T;*=-—4 R;*+T7;* = —148 
R,=980 7,=745 








388 AMERICAN STATISTICAL ASSOCIATION 


The numbers 4;*, ¢:*, ¢;*, given by formula (6.1) are 1053, —6420, 
— 204, divided by 896. The reduction in sum of squares due to rows and 
treatments is 
[(—72)?+ - - - +(—4)?]/64+[(4)?+ - - - +(—148)?]/896 = 4321.67. 

(Formula (6.2)) 
The other numbers needed for the analysis of variance table are calcu- 
lated as in the preceding example. 


d.f S.8 m.s 
columns 14 9,616 
rows and treatments 6 4,322 
residual 39 2,477 63.52 
total 59 16,415 


The tests of significance and the computation of M and its standard 
error proceed as in the preceding examples, using formula (6.11) to 
obtain the value cy, =Ce22=C33 = 0.01674. 

Case (4). Suppose that columns 15 and 16 are missing. The symbol 
describing the distribution of missing treatments and the a-matrix 
derived from it by means of list (7.1) are 


(1, 2) (—-2 0 0 
(Oo 0 0 
(4, 3) +2 0 0 oe 
and . The A-matrixis |8 0 O 
(2, 1) 2 0 
0 0 0 
(3, 4) +2 0 0) 


The rest of the calculation follows exactly as in case (2). 

Case (5). Suppose that column 16 and the observation in row 1, col- 
umn 15 are missing. Again the rows are reordered as in case (3) and 
the missing observation now occurs in row 2. The following numbers 
are needed to apply formula (8.2) to supply a number for the missing 
observation. 


Cis = 137, R2’ = 815, T,’ = 947, R, = 839, T: = 781, G’ = 34387. 
Substituting these numbers in formula (8.2), 


yous). = [(224)(137) + 60(815 + 947) + 4(839 + 781) 
— (32)(3437) |/(584) = 56.34 


or 56 to the nearest integer. 

Substituting 56 for the missing observation, the analysis proceeds 
exactly as in case (3), with 38 instead of 39 degrees of freedom for the 
residual sum of squares and using formula (8.21) instead of (6.11) to 
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obtain the values of the c’s. The value so calculated (0.01717) differs 
only slightly from that given by formula (6.11) (0.01674), but the 
more accurate formula (8.21), appropriate to this case, requires only 
simple calculations and presumably should be used. 

Conclusion. The primary purpose of this paper is to bring together a 
number of methods of dealing with the analysis of balanced designs 
when some observations are missing. The discussion is focussed on a 
specific design, but the methods and devices can be used, with suitable 
modifications, in a much wider range of problems. 

Another purpose may also be served. Apparently some suspicion 
attaches to the “estimation of missing values” as being an attempt to 
extract something from nothing. Perhaps the phrase itself, although 
warranted, is misleading. In any event, it is hoped that the foregoing 
discussion emphasizes sufficiently the fact that the basis and method of 
analysis are the same in all cases, and that the cases singled out here for 
discussion are special only in the sense that, in them, some simplifica- 
tion of the arithmetic is possible. 
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DR. VICTOR SELDEN CLARK* 


1TH the death of Dr. Victor Selden Clark in the 77th year of his 

\\ age, the country has lost one of its outstanding economists. He 

was not concerned either with economic theory or with statistical 

methods. He was primarily an economic analyst, making use of all 

available sources of information concerning the several subjects which 
he investigated and interpreting them in masterly fashion. 

Through much of his career, Dr. Clark devoted his attention mainly 
to economic conditions and trends in those areas of the world concern- 
ing which adequate information was not readily available—outlying 
possessions of the United States, Latin American and Far Eastern 
countries, and the “down under” lands of Australasia. Most of his time 
from 1900 to 1913 was spent on studies of these areas. The emphasis 
was on labor problems but the whole economic picture of each area was 
sketched. In no case did he rely mainly on published information, of- 
ficial or unofficial; he visited each area and obtained most of his knowl- 
edge of it from personal observation and interviews with all classes of 
people. He may well be characterized both as a “globe trotter” and as 
a past master of the reporter’s art. The areas covered by his researches 
included Puerto Rico and Hawaii (in both of which he for some time 
held high-ranking administrative positions), Philippine Islands, Cuba, 
Mexico, Java, New Zealand, Australia, and Canada; incidentally he 
visited Russia and Spain. The results of this work were for the most part 
published by the Bureau of Labor Statistics. Each of the monographs 
which Dr. Clark prepared for that Bureau contains from 100 to 150 
closely-packed pages and was, at that time, the principal source of in- 
formation on the given area. They included critical discussion of prob- 
lems in addition to factual material. 

Toward the end of his career, Dr. Clark reverted again to globe- 
trotting, not only revisiting practically all those areas where he had 
previously made investigations, but also travelling extensively in other 
countries of the Far East and Latin America. For the most part, the 
results of these later travels and researches were not published. How- 
ever, the investigation concerning Puerto Rico, which he and colleagues 
conducted under the auspices of the Brookings Institution during the 
years 1928-30, constituted the basis for a large volume Puerto Rico 
and Its Problems, which is the most valuable one source of informa- 





* Born at Portageville, N. Y., June 12, 1868, son of Major Selden N. and Helen E. (Davis) Clark. 
Litt.B., U. of Minn. 1890; Ph.D. Columbia, 1900; died April 1, 1946, Washington, D.C.; unmarried. 
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tion ever compiled concerning that island and its difficult economic, 
social, and political problems. 

In the interval between these two periods of research in outlying 
countries, Dr. Clark conducted a monumental investigation concerning 
the economic history of this country itself, under the auspices of the 
Carnegie Institution. The first volume, History of Manufactures in the 
United States, 1607-1860, was published in 1916. The work was then 
interrupted by the war and the second volume covering the period 
1860-1914 was not issued until 1927. The two volumes together contain 
nearly 2,000 pages and were exhaustive in scope and illuminating in 
analysis. Professor Henry W. Farnan said in his introduction to the 
second volume: 

... With rare persistence and industry he has succeeded in completing 
the work which is now offered to scholars. Like the History of Manufactures 
to 1860, this study is based on original material. It is an economic history in 
the strict sense of the word. It does not deal with technology and mechanics; 
it does not give biographies of prominent manufacturers; it can not cover 
the details found in histories of specific industries. It does give an interpre- 
tation in broad outlines of the development, the organization and the 
economic interactions of manufacturing industry in our country during a 
truly remarkable period. 


In preparing this history of American manufactures, Dr. Clark made 
use of an immense variety of material, ranging from official statistics 
of the Federal and State governments and of trade associations to items 
in periodicals and personal reminiscences. He traced the effects of 
tariffs on the development of American industry, the gradual spread of 
manufactures from the northeast to the central, southern and western 
sections of the country, the progress of the working classes, the develop- 
ment of great corporations and combinations, and all the other major 
aspects of the history of industry. 

In What Is Money, a compact 88-page book published in 1934—a 
time when the country was passing through a maze of monetary dis- 
cussion—Dr. Clark gave an elementary explanation of money and 
its relation to prices. His approach to the subject, as he said in his 
preface, was “‘by the path of history rather than by that of theoretical 
analysis.” Attractively written, with frequent touches of humor, his 
little book might well continue to be a valuable guide for the “‘man in 
the street’”’ or non-professional reader. 

During the years 1920-28 Dr. Clark was editor of The Living 
Age, Boston, a periodical of long standing and wide circulation, 
which, however, was ultimately discontinued as other ‘‘readers’ digests”’ 
became more popular. During his later years Dr. Clark maintained an 
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office in the Library of Congress, where he not only pursued his own 
investigations but served as a consultant to the Library regarding eco- 
nomic and governmental publications. 


E. Dana DuRAND 
Washington, D. C. 
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CORRINGTON CALHOUN GILL 
1898-1946 


HE death of Corrington C. Gill in Tucson, Arizona, on July 13, 
So closed a distinguished career as government economist, statis- 
tician and administrator. Gill played a significant part in shaping and 
executing governmental policy during the long depression and the war 
period. 

Gill was born in Grand Rapids, Michigan, in 1898. His early life 
was spent in Michigan. Entering the Navy in 1917, he served through- 
out the war in the destroyer service on the French coast. He received 
his A.B. degree at the University of Wisconsin in 1923. 

In 1923 he came to Washington as Manager and Correspondent of 
the Washington D. C. Press Service. After four years he turned to 
independent research and business consulting work. In 1931 he became 
economist and statistician for the Federal Employment Stabilization 
Board. 

In the Spring of 1933 Gill was selected by Harry Hopkins to direct 
research, statistical and finance activities for the newly-created Federal 
Emergency Relief Administration. He occupied a key position in fed- 
eral relief agencies for the following eight years, serving successively 
as Director of the Division of Research, Statistics and Finance of 
F.E.R.A.; Assistant Administrator of F.E.R.A.; Assistant Administra- 
tor of the Civil Works Administration; and Assistant Commissioner 
of the Work Projects Administration. The excellent administrative 
record of the emergency organizations during this period was in large 
measure the result of his efforts. 

In 1941 Gill left W.P.A. to become Deputy Director of the Office of 
Civilian Defense in charge of Operations. The difficult job of organiz- 
ing regional, state and local civilian defense operations throughout the 
nation was performed with his usual speed and effectiveness. In 1942- 
1943 he served for a year as Consultant to the War Department, mak- 
ing a complete survey of the Army Medical Corps and assisting in its 
reorganization. 

In the Spring of 1943 Gill was chosen by the President to become 
Director of the Committee for Congested Production Areas, assisting 
localities with war-swollen populations to obtain urgently needed 
facilities and community services. When the C.C.P.A. was liquidated, 
he became Deputy Director General of the United Nations Relief and 
Rehabilitation Administration in charge of Administration and Fi- 
nance. He held this position until late 1945 when illness necessitated 
his resignation. 
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He is survived by his widow, Julia Turnbull Gill. He is the author of 
numerous economic and statistical articles and of two books, Wasted 
Manpower, and The Challenge of Unemployment. 

Those who knew Corrington Gill and worked with him shall miss 
him a great deal. An unusually competent administrator, he had a rare 
ability to bring out the best in those who worked under him. Numerous 
new developments and techniques in research and statistics were de- 
veloped under his direction and with the assistance of his wise judg- 
ment and shrewd sense of timing. His ability to complete stupendous 
and urgent tasks competently and quickly is rare, in Washington as 
elsewhere. 


Howarp B. Myers 
Committee for Economic Development 
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BOOK REVIEWS 


Edited by 
Oscar KrisEN Buros 
Rutgers University 


Statistical Abstract of the United States, 1944-45: Sixty-Sixth Number. Com- 
piled by the Bureau of the Census. Washington 25: Government Printing Office, 
1945. Pp. xiii, 1023. $1.75. 


Review BY Bruce D. MupGetr 
Professor of Economics and Statistics 
University of Minnesota 


O* FIRST thought a review of the Statistical Abstract of the United States 
seems almost in a class with writing a literary account of the monthly 
telephone bill, and the present writer had that reaction upon being asked 
to write this note. But maybe on further consideration there is something 
to be said, if not too much, about a recurrent annual volume summarizing 
the vast quantities of statistical iaformation that pour out each year from 
and about our national government and our national economy. 

It should be said at the start that the present volume is the current result 
of a development or evolution of many years (66 in fact) in presenting a single 
volume of brief summary information that is in wide demand. Of the things 
that would seem to deserve notice in a technical comment, I do not conceive 
the factual content of the volume deserves a very important place—and 
that mainly because most people who will have occasion to use the Statistical 
Abstract are likely to have a good general idea of the sort of information that 
is available, information such as is collected and published by different 
subdivisions of government and by some non-governmental agencies. 
Illustrations: population, vital statistics, prices, foreign and domestic 
trade, agricultural, manufacturing and mining production and the like. 
There have been additions to the factual content as the years have passed 
and a cursory examination of the present volume indicates that some of the 
recent additions have come from the sample studies developed in the last 
few years. For example, section five on the labor force contains tables con- 
structed from data obtained in the monthly report on the labor force—a 
very new and very important type of inquiry now being carried on in the 
Census Bureau. 

Since the Statistical Abstract is a compendium of summary statistical in- 
formation in a vast number of fields, the most important thing about it 
would seem to be how well has it done the job it is designed to do. This is a 
matter, largely, of the organization of the book and of the construction of 
the individual tables so that users can find what is there and can under- 
stand it when found. A new feature in this volume, and a commendable in- 
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novation is “the presentation of a general note at the beginning of each 
section in order to furnish a background for the statistics in the field—.” 
Such notes have been prepared for fifteen of the thirty-four sections and fur- 
ther extensions are promised in future editions. This feature plus one other 
makes for easy location of available information on any topic. The other 
is an extensive and carefully constructed index. Furthermore the source 
reference for each table (of which there are over one thousand) is an improve- 
ment over what used to be, in that the source reference gives the publication 
in which the original data appeared, as well as the agency which published 
it. For example, table 80 of this volume giving death rates per 1,000 popu- 
lation by sex and age-groups 1900-1943 lists the source as “Dept. of Com- 
merce, Bureau of the Census; Vital Statistics Rates in the United States 
1900-1940; basic figures 1940-1943, annual report, Vital Statistics of the 
United States, Part II.” In contrast the source reference for the correspond- 
ing table of the 1931 Abstract (there it is table 76) merely lists “Bureau of 
the Census, Dept. of Commerce.” In the front of the volume is a list of all 
tables classified by source; thus all tables obtained from the Bureau of Agri- 
cultural Economics are listed under that title in this classification. This 
year’s practice on source reference is a great improvement for anyone 
who wishes to seek more complete data in the original source. 

The feature of most importance in the construction of individual tables is 
the appropriate arrangement of classifications in stub and caption and the 
adjustment of spacings, lines, differences in type, etc., in such a way as to 
bring out the information contained in the figures. We have all seen ex- 
amples of badly constructed tables where the effort was almost more than 
its worth to discover what the figures meant. In this respect the Statistical 
Abstract has for many years done a reasonably good job, which could in at 
least one respect be improved; but I suspect the improvement would run 
counter to long-established habits in the Government printing office and 
therefore, would have a great inertia of habit to overcome before it could be 
introduced. I refer particularly to the use of different weights of line and 
sometimes differences in type in tables to help to show coordinate and 
subordinate relationships among the figures. There is evidence of some 
tendency in this direction, but it could go much further, to the great im- 
provement of the abstract. Consider for example, table 289 on Income Tax 
Returns to States and Territories for 1939, 1940, 1941. In the stub clas- 
sification, U. 8. totals and section designations are in heavy faced type, 
whereas individual states (subdivisions of the sections) are in lighter type, 
equally for the figures referring to these several categories. But it should be 
noted that in some tables the heavy faced type is scarcely distinguishable 
from the lighter faced variety. This, of course, is a printer’s problem, but it 
does deserve more attention than it has received in our government printing 
office. Statistical volumes in many European countries have for years been 
vastly better than ours on this score. Again illustrating from table 289, 
these tables make much less use of different weights of line than is desirable, 
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and again the proof lies in the better job done in European printing offices. 
Table 289 would be greatly improved if, for example, the lines between 1939 
and 1940 were much heavier than the two lines within the 1939 category; 
and similarly for other years. 

The above criticism that the Statistical Abstract does not make enough use 
of differences in type and different weight of lines is, I think, deserved, and 
here is one chance for improvement on the technical side of presentation. 
Despite this objection, the government has placed within some thousand 
pages a vast body of summary statistical data and has organized it in such 
a way as to make it readily available to its potential users—a truly monu- 


mental achievement. 


Statistical Analysis for Students in Psychology and Education. Allen L. Edwards 
(Associate Professor of Psychology, University of Washington). New York 16: 
Rinehart & Co., Inc. (232 Madison Ave.), 1946. Pp. xviii, 360. $3.50. 


Review BY Davin A. GRANT 
Associate Professor of Psychology 
University of Wisconsin 

pwakps’ Statistical Analysis is not just another elementary statistics 

book. Unusual features and clean breaks with tradition are to be found 
in both structure and content, and several of the unique features are to be 
applauded. For example, one-third of the book is given over to the statistics 
of small samples, and two chapters are devoted to analysis of variance. The 
reviewer agrees with Edwards that “... whether the traditional attitude 
approves or not, more and more research as published in psychological and 
educational journals is being evaluated by small sample techniques.” Other 
good points about Statistical Analysis include clearly written elementary 
chapters on “Probability and Frequency Distributions” and “Research and 
Experimentation,” modern sampling treatment for the correlation coeffi- 
cient, the use of the Charlier checks in computation, and a chapter on ele- 
mentary algebra. This last-mentioned chapter is perhaps a bit too elementary 
omitting logarithms and exponents. Some teachers of statistics may have 
mixed feelings about teaching statistics to students whose mathematical 
backgrounds are so inadequate as to permit them to profit from such a simple 
review of algebra. 

One can scarcely approve of one of the unusual features of the book, the 
deliberate omission of any treatment of the reliability of test scores. To 
begin with, this material has unique value for students in psychology and 
education. Its absence detracts materially from the usefulness of Statistical 
Analysis as a general elementary textbook in statistics for psychologists and 
educators. Furthermore, in spite of Edwards’ comment that it does not 
“...fall within the general orientation of the rest of the book...,” 
fallibility is a property of all actual mensuration, including reflex magnitudes, 
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discrimination limens, reaction times, and other measures from “pure” 
psychology. To omit the topic of reliability even from an experimentally 
oriented statistics book such as Statistical Analysis would seem to encourage 
superficial interpretation of the nature of measurement in general. 

In the reviewer’s opinion, Statistical Analysis suffers from a number of 
pedagogical weaknesses. These include: (1) an undesirable multiplicity of 
definition and computational formulae which seems bound to confuse the 
elementary student; (2) instances of unnecessary notational complications, 
such as the use of v for o?, and z for the critical ratio or significance ratio which 
requires the following footnote (p. 186): “Fisher has introduced a transforma- 
tion of r into another statistic which is known as z (not to be confused with 
the z mentioned earlier)...” ; and (3) introduction of Peters and Van 
Voorhis’ é as an alternative to F in analysis of variance, accompanied by the 
statement, “You may wonder at this point whether F or é should be used 
as a test of significance in a given problem. There is no definite answer to 
this question. ...” (p. 236). It seems to the reviewer that an elementary 
textbook should avoid such sources of confusion. 

The reviewer’s chief criticism centers around the general emphasis of the 
book. The reader, intrigued by the list of chapter topics, and led on by a 
promise in the preface that emphasis is to be upon use and function rather 
than calculation, is doomed to a real disappointment as he works his way 
through calculation after calculation, coding technique after coding tech- 
nique. On the topic of correlation, for example, 12 pages are expended on 
the development and use of definition and computational formulae, but only 
two and one-half pages are given to interpretation of r, and most of this is 
rather superficial. In this respect, it would seem that Edwards is caught up 
in precisely the very “vicious circle” he promises, in the preface, to break. 

With respect to statistical errors, Edwards’ book is very much like the 
first editions of most elementary statistics texts which reach the market. 
The slips are disconcerting, but minor. They include: (1) the statement 
that computation of the probable deviation requires computation of the 
S.D. (p. 50); (2) an inaccurate technique used to obtain fiducial limits for a 
proportion (p. 169); (3) a remark that grouping will cause no serious error 
in the §8.D. (p. 70); (4) a formula for ¢ which holds only when n; =n: 
(p. 182); (5) an erroneous assertion about a, (p. 185); and (6) an interpre- 
tation which accepts the null hypothesis when x? does not exceed the 5 per 
cent point (p. 247). 

Statistical Analysis is better than many current statistics books. But, be- 
cause of the points mentioned above, the reviewer does not think that it 
will prove to be an excellent teaching text. He does not rate it in the same 
class with the elementary texts of Lindquist and Helen Walker. Edwards’ 
book is, in many respects, a more forward-looking text, but it lacks the teach- 
ing finesse and the depth of interpretation of its best available competitors. 


Income from Independent Professional Practice. Milton Friedman (Associate 
Professor of Economics, University of Chicago) and Simon Kuznets (Professor of 
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Statistics and Economics, University of Pennsylvania). Publications of the Na- 
tional Bureau of Economic Research, No. 45. New York 23: National Bureau of 
Economic Research, Inc. (1819 Broadway), 1945. Pp. xxxiii, 594. $4.50. T'wo 
reviews follow: 
Review BY R. L. ANDERSON 
Associate Professor of Experimental Statistics 
and Agricultural Economics, Institute of Statistics 
University of North Carolina, Raleigh 


HIs book summarizes the results of several years’ study of the net 
Tien from independent practice by physicians, dentists, lawyers, 
certified public accountants and consulting engineers. The authors have 
made use of many modern statistical techniques such as the x?-test, the 
analysis of variance and multiple regression, which in the past have been 
applied too infrequently to the analysis of economic data. Dr. Friedman’s 
techniques for the analysis of ranked data have also been utilized to good 
advantage in several places. 

Because of the wide usage of modern statistical techniques, this book 
should receive careful scrutiny by economic statisticians. The authors are 
careful to point out the basic assumptions behind such tools as the analysis 
of variance, assumptions which are too often not fulfilled with economic 
data. Such a treatment should help immeasurably to promote the idea of 
finding means of altering the old statistical tools or of devising new ones in 
order that economic data can be analyzed more objectively than in the past. 
While some criticisms will be made below of the statistical methods em- 
ployed, these criticisms in the main are minor in character and the general 
quality of the statistical methodology is well worth the attention of econ- 
omists. 

The first two chapters describe the professions studied and the methods 
used in obtaining the samples and in adjusting for known biases in the sam- 
ple results. Unfortunately, the authors did not have available some of the 
modern techniques of conducting mailed questionnaires, which were used 
exclusively in this study. It was concluded that the nonresponse problem 
was not important in this survey, because no correlation was found between 
income and response. However, the geographical distributions of the sample 
responses did not correspond with those of the population. It is the reviewer’s 
opinion that such an extensive study should have been based on a more 
solid sampling foundation with some provision made for sampling the non- 
responders by direct interview. Also some provision should have been made 
to check on the accuracy of actual replies. For example, did deductions from 
gross income include only business and not personal expenses? 

The authors present a detailed analysis of the adjustments necessitated 
by such things as the exclusion of all but members of the American Dental 
Association in the dentist’s sample and the apparent size of community bias 
in the samples for doctors and lawyers. The sample and population fre- 
quencies were compared by use of the x?-test for goodness-of-fit and of x,?,the 
latter being Friedman’s test for rank correlations. The attempts to justify 
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the results on the basis of previous state surveys seemed inconsistent with 
the previous statement that the sample results were not good for separate 
geographical units. 

Chapters 3 and 4 contain the pertinent conclusions of the study, compari- 
sons between professional and other income and comparisons of income 
among the five professions. Average professional income was about three 
times as large as that for people not engaged in one of these professions. 
This result was largely explained by the more expensive and longer training 
required of professional men, the larger communities (with larger average 
income for any work) which attracted professional men, the higher ability 
required for professional work, and the noncompetitive nature of the pro- 
fessions. My opinion is that too little importance was placed on the factor 
of ability. However, the restriction of entry and the noncompetitive nature 
of the professions cannot be stressed too much. These are factors for which 
an adequate statistical analysis is not available; hence, the possibility for 
personal bias in the evaluation of causes is quite important. One also 
wonders what the inclusion of educators would have done to the over-all 
professional picture as herein presented. 

For all of the income data, it was found that the average incomes and 
their standard deviations were highly correlated, indicating that it might 
have been better to assume logarithmic relationships in the analyses. Lorenz 
curves were used to study relative variability among professional incomes 
and between professional and nonprofessional incomes. The five professions 
ranked as follows as regards average income (high to low): engineers, ac- 
countants, lawyers, doctors and dentists. There was little difference between 
the average income for lawyers and doctors. 

A detailed analysis was made of the difference between dentists’ and doc- 
tors’ incomes. This reviewer tends to agree with a comment made by C. R. 
Noyes (pp. 405-10) to the effect that the assumptions made about the 
differences between these two professions were of such an arbitrary nature 
as to render highly untenable the statement that doctors restrict entry into 
their profession more than dentists. A multiple regression was run for the 
doctors’ average income in each state using the percentage of doctors in the 
population and the per capita income as independent variables; similarly, 
for dentists. This kind of analysis of economic data should be used much 
more. In this case the authors were able to obtain the elasticity of demand 
for services for a given change in the percentage of doctors and dentists in 
the community, holding per capita income constant. 

Chapters 5 and 6 present a detailed analysis of several causes of income 
differentials. The analysis of variance with the method of expected subclass 
numbers for disproportionate frequencies might have been profitably used 
to test for differences among regions and among communities of different 
sizes. Instead the authors utilized the x,? method mentioned above. Data 
were too scanty to discuss the important topic of influence of training and 
inherent ability on professional income. The importance of such intangibles 
as family connections and political influence are not stressed enough. 
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Chapter 7 discusses the stability of relative income status for the same 
individual in different years. A new treatment of the problem is presented 
here whereby income is said to consist of permanent, transitory and quasi- 
permanent components. If this analysis can be coupled with some estimate 
of the variability of the separate components, it might prove of value in 
locating the cause of the extreme variability in professional incomes. Linear 
regression analysis was used to compare incomes in successive years. This 
reviewer believes that any recommendations based on annual trends are 
extremely hazardous because of the correlations due to general business 
conditions. It is suggested that a multiple regression analysis utilizing some 
general business index as another independent variable is needed. The au- 
thors use the analysis of variance to test for nonlinearity. They are quite 
perturbed about the normality requirement for the analysis of variance, 
but the correlation of mean and variance is a much more important disturb- 
ing factor. 

Chapter 8 presents some generalizations on the temporal changes in in- 
come for each profession and for business as a whole. The most important 
conclusion here is that there is a decided positive correlation between changes 
in professional and in general business incomes. 


REVIEW BY ZENON SZATROWSKI 
Instructor in Economics, Northwestern University 
HIs study deals with the incomes from independent practice in five pro- 
fessions, medicine, dentistry, law, certified public accountancy, and 
consulting engineering; however, intensive analysis is applied to the more 
adequate data of medicine and dentistry. The data, obtained from question- 
naires sent to sample groups, covers the period from 1929 to 1936. Presenta- 
tion of these statistics in the volume, though a significant contribution, is 
only a small part of the contents. Analyzing differences in income and 
measuring the influence of factors giving rise to these differences appears to 
to be the principle objective. To quote the authors (p. 63): “But it is not 
enough merely to name factors making for variability in income. To be use- 
ful the catalog must be quantitative as well as qualitative; the importance 
of the different factors and the direction and magnitude of their influence 
must be measured.” 

Pursuing this objective, the authors use statistical techniques which are 
relatively simple but adequate for the problem. Pertinent characteristics of 
the distributions of incomes are obtained and used to describe differences in 
professions and variation from year to year. Arithmetic averages, medians, 
and quartiles are employed as estimates of general level of income; the inter- 
quartile range, standard deviation and Lorenz curves serve to measure vari- 
ability in incomes. These statistical measures are used as a basis for compar- 
ing incomes from independent practice with incomes from salaried em- 
ployment in the same professions, with incomes in other occupations and, of 
course, differences between the professions. Correlation techniques are used 
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to measure the stability of incomes from year to year and the relationships 
between income and its determinants, such as size of community, years of 
practice, etc. Index numbers are derived in order to describe temporal 
changes in the level and variability of incomes for the different professions. 
The reliability of the data itself is discussed at length. The objective tests 
of reliability which are used are based on a comparison of sample data with 
samples of other studies and the universe, with application of the x? test. 

The approach followed by the authors is to give a relatively complete 
discussion of the problem in connection with each step in their statistical 
analysis. This is desirable because it enables an individual with little sta- 
tistical training to get a good deal of information about the subject. In addi- 
tion, in this way, the authors present the reasoning which justifies their 
statistical procedure. For example, a priori they derive the determinants of 
income by considering the factors which influence the demand, the supply 
and the nature of the competition in connection with the different pro- 
fessions. 

In their discussion the authors are careful to point out the limitations and 
shortcomings of their data and the statistical techniques they employ. The 
problem, from their standpoint, is to extract as much information as possible 
from the relatively inadequate data which is available. They do this compe- 
tently. The value of the book lies not merely in the information obtained 
from the data which was available. The procedure which is developed can 
be used profitably as a pattern for both a further study of more complete 
data on the same professions, and also in connection with studies of other 
income groups. 


Mathematics of Investment, Third Edition. William L. Hart (Professor of 
Mathematics, University of Minnesota). Bound with Tables for Mathematics of 
Investment, Third Edition. Boston, Mass.: D. C. Heath & Co., 1946. Pp. vii, 304; 
ii, 126. $3.60. 
REVIEW BY LLoyp A. KNOWLER 
Chairman, Department of Mathematics 
State University of Iowa 


HE text consists of the three parts: Part I, Annuities Certain; Part 
II, Life Insurance; and Part III, Auxiliary Topics. Essentially a 
fourth part is provided by a rather extensive set of tables for annuities cer- 
tain and a few functions of the American Experience Table of Mortality 
with interest rate of 34%. 
The main features by which this edition differs from the preceding one are 
outlined in the preface somewhat as follows: 


An alteration has been made in the presentation . . . with the aim of giving 
the student maximum acquaintance with .. . applications before he meets 
the more unusual situations. ... The exercises have been freshened... . 
The format of the text and tables has been changed. ... 


In carrying out “The primary aim to adapt the material, particularly that 
dealing with annuities certain, to the needs and the ability of the typical 
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student in a college of business administration” the author has attempted 
to keep as the “minimum prerequisite ... a substantial second course ir 
high school algebra” and as a “maximum prerequisite (with certain excep- 
tions) a standard first course in college algebra.” 

Part I dealing with annuities certain covers pretty much the material in 
standard texts of mathematics of investment or finance. As with the pre- 
ceding edition, it is rather well written. A nice collection of 106 review prob- 
lems is given at the end of this part. 

Part II dealing with life insurance consists of three chapters, namely: 
Life Annuities, Life Insurance, and Policy Reserves. On page 204 we find 
that “Table XIV (The American Experience Table of Mortality) was formed 
from the accumulated experience of many American life insurance com- 
panies.” The accuracy of this statement is doubted in that it has been previ- 
ously reported that the mortality statistics were deduced from experience 
of the Mutual Life Insurance Company of New York, but that the figures 
were inadequate at the older ages and so they were arbitrarily adjusted some- 
what. Also, in the preface, we observe that “the notation and emphasis in 
Part II have been oriented thoroughly with respect to existing actuarial 
customs.” Some sources might question the use of the American Experience 
Table at 34% interest as conforming to this custom. The use of this table at 
the interest rate indicated has no effect, however, upon the underlying prin- 
ciples. This part is meant, apparently, to be no more than a mere introduc- 
tion to the elementary principles of the mathematics underlying life in- 
surance. 

Among the auxiliary topics in Part III is a discussion of computation and 
logarithms; progressions; and an appendix consisting of computation of 
interest by use of binomial expansion, force of interest, comparison of com- 
pound and simple interest for fractional interest period, certain interpola- 
tions, reference to depreciation charges, and abridged multiplication—a 
valuable aid which in the opinion of the writer should be used in those classes 
where appropriate tables or computing machines are not available. 

Although several good textbooks exist in this area, it is worthwhile to 
consider this one for use in an introductory course. It will, no doubt, be 
adopted by a number of institutions. 


Theory of Functions: Part 1, Elements of the General Theory of Analytic Func- 
tions. Konrad Knopp (Professor of Mathematics, University of Tibingen, 
Germany). Translated from the fifth German edition by Frederick Bagemihl 
(Instructor in Mathematics, University of Rochester). New York 19: Dover 
Publications (1780 Broadway), 1945. Pp. vii, 146. $1.25. 


REVIEW BY EpMuND CHURCHILL 
Instructor in Mathematics, Rutgers University 


HE dependence of modern statistical theory on advanced mathematical 
tools makes it increasingly desirable that the statistician have a 
background in mathematics that goes well beyond the traditional courses in 
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calculus and algebra. Some knowledge of the theory of functions of both real 
and complex variables, for example, is certainly essential to a complete un- 
derstanding of such books as those by Kendall and Wilks on statistical theory 
and of most of the literature on distribution theory. The statistician not 
already acquainted with the elements of complex function theory will find 
this little book by Knopp an excellent introduction to the subject. It is a 
remarkably readable book, concise, clear, and rigorous. A number of brief 
illustrations and alternate forms of definitions help to clarify important 
points. 

While this book was not written especially for the statistician, the material 
which it contains is so basic as to be valuable in its entirety to the statis- 
tician. The subjects treated consist of numbers and points, the general con- 
cept of a function of complex variable, integrals of complex functions, 
Cauchy’s integral theorem and formulas, Taylor and Laurent series, analytic 
continuation, singularities, and the residue theorem. Perhaps half of these 
topics are involved, for example, in calculating the moments of the normal 
distribution by means of its characteristic function. The residue theorem 
often offers the simplest method of calculating integrals that arise in statis- 
tical problems. Other problems of statistical importance require a good deal 
more function theory than could be included in a book of this size, but a 
knowledge of what is in Knopp’s book should enable one to comprehend 
most of the advanced theory in the more complete treatises when it is needed. 
What, in turn, is needed for an understanding of this book is roughly the 
equivalent of three semesters of calculus, including a good grasp of the con- 
cepts of convergence and continuity. As a book for self study, this book 
suffers from the small number of exercises included. A person using the book 
for this purpose would do well to obtain the first volume of Knopp’s Auf- 
gabensammlung zur Funktionentheorie (Berlin: Walter de Gruyter & Co., 
1923), a collection of several hundred exercises on this material with their 
solutions, which, incidentally, will put little strain on the reader’s German. 

The physical appearance of the book and its translation are good; the 
price is moderate. Similar publication of several other volumes in the Ger- 
man “pocket book” series would constitute a real service to students of both 
statistics and pure mathematics. 


Curve Fitting for Students of Economics. Brij Narain (Professor of Economics, 
Sanatana Dharma College, Lahore, India). Lahore, India: 8S. Chand & Co., 1944. 
Pp. viii, 197. Rs. 10/-. 


Review BY Joun H. Smit 
Acting Chief Statistician, Bureau of Labor Statistics 
Washington, D. C. 


N THE first four chapters Narain presents elementary applications of least 
I squares to original data and to their logarithms. The fifth chapter is de- 
voted to population concepts such as life tables, net and gross reproduction 
rates, and age composition. In the last two chapters is presented a method of 
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fitting curves to population data and to frequency data in ogive form to 
which “the attention of the reader is specially drawn.” If z represents popu- 
lation, an arbitrary time origin is chosen at which z =a/2. The .rend value 
of z at time ¢ is determined from the polynominal trend value of K, the 
average rate of change in log x— log (a— 72), from the origin to t. The presen- 
tation is made more difficult to understand by the fact that the author 
calls a the “maximum population” when most of the examples show popula- 
tion for which the trend declines before reaching this “maximum.” This 
method might be adapted to special types of problems. Even if less flexible 
trend functions were chosen for K, however, the method could not replace 
forecasts of population based on trends in birth and death rates and changing 
age composition of the population. Except for this specialized type of curve 
and the use of Indian data with accompanying discussion, this book has 
little in it which would be of interest to readers who are familiar with the 
much more comprehensive books by writers in England and the United 
States such as the ones by Croxton and Cowden, Ezekiel, Elderton, and 


others. 


LETTERS ABOUT BOOKS 





Readers are invited to submit letters qbout statistical methodology books for 
publication in this forum. Concise, informative letters which supplement 
previously published reviews by pointing out specific strengths, weaknesses, 
errors, and errata in currently used books are wanted. Criticisms based on 
actual use of a book as a text are especially desired from statistics instruc- 
tors. Other letters may consist of suggestions for the writing of books and 
reviews. Letters which contain adverse criticisms of JOURNAL reviews will 
be submitted to the author of the review for any reply he may care to make. 
Contributors are requested to avoid personalities. The right to decide whether 
a letter merits publication is reserved. Letters should be sent to the review 
editor, Oscar K. Buros, Rutgers University, New Brunswick, N. J. 





SEQUENTIAL ANALYSIS OF STA- 
TISTICAL DATA: APPLICATIONS 


A= by B. L. Welch of Sequen- 
tial Analysis of Statistical Data: Ap- 
plications, contains the following sen- 
tence: “A basic criticism which has 
been made of Wald’s work is that fre- 
quently m will be about halfway be- 
tween m, and m:, and that therefore 
appreciable average reductions in sam- 
pling size will not happen.” I should 
like to comment briefly on this. 

If it were true that a substantial 
part of a factory’s output had to be 
characterized by m about halfway be- 
tween m, and m2, much more would be 
wrong with the sampling inspection 
scheme than merely failure to effect 
economies in inspection. The probabil- 
ity of acceptance at m is of the order 
3. (By this is meant that it is fairly 
large. Its exact value of course depends 





upon the various parameters involved.) 
Now if product characterized by m is 
satisfactory, than either the producer 
or the consumer has a legitimate griev- 
ance in that a substantial, satisfactory 
portion of the output is being rejected 
or completely inspected with the cost 
that this implies. If product character- 
ized by m is defective, then a substan- 
tial fraction of the output is defective, 
and these defective items are being dis- 
posed of partly by acceptance by the 
consumer and partly by rejection (at 
the producer’s expense, presumably). 
Such an economic setup cannot long 
endure and calls for changes more 
drastic than a change in sampling 
plans. It seems to me that in a produc- 
tion system which has reached at least 
the approximate stability where one 
can be concerned with such questions 
as minimizing average inspection, the 
bulk of the output must be in the 
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neighborhood of m:; consequently, the 
criticism is invalid. 

Now m, denotes good product and 
m, unsatisfactory product. Let the 
process average be denoted by m 
which therefore lies near m2: A Wald 
sequential test minimizes inspection 
at m, and m:. It seems to me that a 
next step should be to construct a test 
procedure which will have the pre- 
scribed power at m and m; and will 
minimize inspection at the process 
average m. Using Wald’s equation for 
the power of a sequential test such a 
test procedure can at present be con- 
structed to a required degree of ap- 
proximation only at large cost in com- 
putational effort. Clarification of this 
problem should be of service to those 
engaged in acceptance sampling and 
would no doubt contribute to the 
theory of Wald tests. 

J. Wo.rowirz, Associate Profes- 
sor, Institute of Statistics, Uni- 
versity of North Carolina, Raleigh 


INDUSTRIAL STATISTICS 


y RESPONSE to your request for letters 
concerning recent books on statisti- 
cal methodology we comment below 
on H. A. Freeman’s Industrial Statistics 
(New York: John Wiley & Sons, Inc., 
1942). We feel that the title and wide 
distribution of this book have caused 
many statistical novices to read it, 
and that it is therefore important to 
correct some of its more important 
errors. For many of these, the author 
should not be held responsible; they 
have been collected from the litera- 
ture where they have the support of 
eminent statisticians. 

THE USE OF THE ¢ TEST. On page 17 
Freeman points out that “a large 
value of ¢ may reflect differences in 
variances rather than differences in 
means” and then passes to a test of 
homogeneity of variance. Why not in- 
clude a simple, conservative version 
of the ¢ test? One such is obtained by 
estimating the variance of each mean 
from the corresponding sample, ad- 
ding these together to estimate the 
variance of the difference of means, 
and then using the smaller number 
of degrees of freedom. When the two 
samples are of equal size, the only 
change from the classical procedure is 
the halving of the number of degrees 
of freedom. Here one may point out 
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that the “loss” of degrees of freedom 
is an example of the balance of 
sensitivity vs. security which occurs 
throughout all statistical procedures. 

Immediately following the warning, 
Freeman analyzes an example where 
the variance ratio is 3.65, which is at 
about the 8 per cent point for 9 and 9 
degrees of freedom (two-tailed F test). 
The lack of homogeneity is surely sus- 
picious and might well be discussed, 
yet it is not. The 5 per cent point of ¢ 
on a conservative 9 df would be only 
2.262 instead of 2.101 on 18 df. This 
fact would help the reader or student 
who has to apply this technique in 
practice. We are reluctant to conclude 
from this warning and example that 
Freeman believes that whenever a test 
of significance fails to indicate differ- 
ent variances it is automatically safe 
to assume them the same, but a stu- 
dent might easily conclude this. 

On page 62 we find “Had the entire 
set of grids differed significantly among 
themselves, the following procedure 
could have been used to determine 
whether or not the apparent best and 
second best grids differ significantly be- 
tween themselves.” After outlining 
the use of ait test after an F test, Free- 
man points out that the difference 
between the largest and the smallest 
will often appear “significant” due 
solely to error, and then warns the 
reader briefly that “A ¢ test applied 
to two means after over-all homo- 
geneity has either been refuted or not 
must be used with caution.” He might 
also warn the reader about the case 
where the F test has not been made. 
Yet on the next page he uses the ¢ test 
on the best and second best without 
comment! In noticing this point we 
do not wish to imply that a good solu- 
tion exists—we know of none—but we 
feel that the reader should have been 
warned. 

THE USE OF Ly AND L, TEsTS. On 
page 17 Freeman introduces the Ih 
test, for application to the case of two 
variances. From a pedagogical point 
of view there seems much to be gained 
by using a two-tailed F test, which is 
perfectly equivalent and part of a 
widely applicable test. At the top of 
page 17 the sample sizes nx and ny 
are potentially different, in the middle 
of page 17 the L, test is suggested, and 
it is not until page 87 that we learn 
that the author does not want to use 
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the L; test unless the sample sizes are 
equal. What is one supposed to do 
with two unequal samples? (Why not 
use the ZL, test?) 

On page 87 Freeman states the ques- 
tions which the Lo, L; and F tests are 
supposed to answer, but he makes no 
statement of the very important fact 
that these tests assume that the differ- 
ent samples came from normal popu- 
lations whether or not the hypothesis 
tested is true. On page 91 these tests 
are then applied blindly to suspicious- 
looking data without inquiring as to 
normality. 

CONCLUSIONS FROM TESTS OF SIG- 
NIFICANCE. On page 24 we find that 
lack of statistical significance implies 
that “Hardness is really affected by 
stain, whereas breaking [bending] 
strength is not.” That is, from lack of 
statistical significance we conclude 
lack of real effect—this seems highly 
unsound! In the actual situation, it is 
extremely probable that whatever 
chemical action is involved in staining 
does have some average effect on the 
bending strength. While the data 
probably do support the conclusion 
that the effect of stain on bending 
strength is unimportant in practice, 
it is very important to distinguish 
between this conclusion and that 
stated on page 24. 

There is a general tendency to dis- 
cuss the case of two significance tests 
and a combined test in a confused or 
erroneous way. It is well known that 
if all tests are applied at the same level 
of significance the following possibili- 
ties arise: (a) all tests show significant 
effects; (b) one single test and the com- 
bined test show significant effects; 
(c) one single test shows a significant 
effect and no others do; (d) no single 
test shows a significant effect but the 
combined test does; and (e) no test 
shows a significant effect. 

Case (c) arises infrequently, but, as 
the example 4+1=5 for x? on 1+1=2 
degrees of freedom shows at the 5% 
level, it can easily arise in practice. 
What conclusion to draw when it does 
arise is a difficult and, we believe, un- 
settled problem. (In fact, the general 
question of drawing conclusions when 
multiple tests are applied seems to be 
almost ignored by Freeman.) The dis- 
cussion at the foot of page 61 is a case 
in point, although four tests not two 
are involved, for we are asked to con- 
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clude that if relatively insensitive tests 
for normality and for equality of vari- 
ance detect nothing, then a large F 
value must imply differences in the 
means (“it follows that the part of H 
which is untenable is”). This neglects 
all possibilities of type (c)! On page 89 
we have a similar situation—if the Lo 
test is not significant, then the Z; and 
F tests “must both fail (i.e., show no 
significance).” 

Possibility (d) above is also hard to 
interpret; 3+3=6 illustrates its oc- 
currence for x? on 1+1=2 degrees of 
freedom. On page 90 we are told that 
such things cannot be, for if the Lo 
test is significant and the L, test is not 
“hence they must differ in their 
means,” 

This treatment of overlapping tests 
of significance seems entirely unsatis- 
factory. 

One of the most serious examples of 
drawing false conclusions from tests of 
significance occurrs in Freeman’s dis- 
cussion of regression problems, where 
significance of regression is misinter- 
preted to mean adequacy. On page 
100 we are told that the “adequacy of 
a straight line... may be tested by 
determining the probability that ....” 
It is clear that this tests whether the 
slope of the regression line is different 
from zero and that it cannot, for ex- 
ample, indicate whether a curved re- 
gression line is needed. On page 102, 
“If the regression line is inadequate, 
the mean square due to regression will 
not be significantly larger than the re- 
sidual or chance mean square.” Chance 
mean square, indeed! this mean square 
contains all effects due to curvilinear- 
ity of regression! The same comment 
still applies. On page 104, “Linear re- 
gression does not account for a suffici- 
ent part of the total variability; it is 
not adequate for the purpose of pre- 
diction.” This conclusion is reached 
on a total of 10 degrees of freedom and 
without consideration of the accuracy 
required! On pages 107-8 the question 
is asked “Would the regression... 
enable us effectively to predict warp 
breakage, or is the residual variability 
too great?” The question is not an- 
swered explicitly, but a regression on 
the “important influence, namely, 
relative humidity” is calculated, and 
found to be highly significant. It is 
not pointed out that the effect of re- 
gression is to reduce the mean square 
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from 1.23 to 1.18, and the standard 
error of estimate by less than 3 per 
cent. Is this effective prediction? 

Finally on page 117 there is a com- 
bined conclusion, that because F =35.8 
the regression describes the relationship 
between tensile strength, hardness and 
density. There is no reason to believe 
that “this relationship,” whatever that 
may mean, is linear. 

THE PROBLEM OF NORMALITY. Free- 
man’s attitude toward normality in 
small samples does not seem to be 
clear or consistent. On page 9 we are 
told that “One of the facts assumed to 
be known of the population ... nor- 
mally distributed” and this is followed 
on the same page by “The technique 
of testing the hypothesis that the popu- 
lation is normal is similar in general 
nature to the test of the hypothesis 
d’=0....” This is followed by a de- 
scription of the a and }, tests of nor- 
mality. These tests are applied to a 
sample of 15, with no specific indica- 
tion of the wide variations from nor- 
mality which would not be detected 
in such a sample. Freeman states only 
that “For small samples these tests 
are sensitive only to large departures 
from normality.” 

On page 21, we have a paired com- 
parison of 28 differences, one of which 
is exceptionally large. This is clearly a 
fertile field for a test of normality, 
even with a sample as small as 28. 
Yet there is no such test, but only a 
discussion of the possible omission of 
this observation. Freeman states that 
omission is unsound without ancillary 
information, and does not discuss the 
very practical problem where appar- 
ently discrepant observations are due 
to nonnormality. 

On page 25, we have a sample of 5 
and a firm decision not to make a test 
of normality. We conclude that Free- 
man would stop testing somewhere 
between 15 and 5 observations. On 
page 27 we are told that if the sample 
is larger than 50, and if the population 
is as much as 10 times as large as the 
sample “the tendency to normality of 
the distribution of the means of ran- 
dom samples is negligibly affected by 
the nature of the population.” The 
well-known Cauchy distribution and 
its many relatives show the incorrect- 
ness of this as a mathematical state- 
ment. Opinions may differ as to its ap- 
plication in practice, but we doubt its 
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validity. The student may easily con- 
clude that in samples of 50 or more 
it is useless to test normality, since it 
does not matter. We would hold, on 
the contrary, that samples of more 
than 50 are needed to make a test of 
normality which is sensitive enough 
to give any comfortable assurance of 
sufficient normality to warrant the 
use of the common tests at even the 5 
per cent level. In practice, evidence 
about normality or its lack comes from 
broad experience or other samples. 

On pages 37-38 Freeman develops a 
more or less conventional derivation 
of the normal distribution from many 
contributing effects. There is no indi- 
cation that these effects must be ad- 
ditive. (Clearly if additive effects make 
X normally distributed, then multi- 
plicative effects make e* nonnormal!) 

THE ANALYSIS OF VARIANCE. The 
introduction of the analysis of vari- 
ance on page 58 seems exceedingly 
messy. We are unable to determine 
whether Freeman wishes to treat the 
column population means as fixed or 
as a random sample from a normal 
population. We feel that any adequate 
treatment must discuss and dis- 
tinguish both approaches. 

On page 63 we are told that “Five 
independent estimates of the variance 
of the population can be found, of 
which four are listed in the following 
table.” There is no such fifth estimate. 

On page 72 we are told that “The 
F test, as used in the analysis of vari- 
ance, is essentially the ratio of vari- 
ability associated with a suspected 
cause to error.” This is, it seems to us, 
exceedingly liable to misinterpretation 
—only if the reader really understands 
the analysis of variance in advance can 
he interpret the word “associated” 
safely. 

On the same page Freeman tests the 
variability of lot means. If he had 
wished to make the test on the assump- 
tion that either (a) the same nine 
rolls from each of the same three lots 
are to be tested and retested, or (6) 
there are exactly nine rolls in a lot; 
then it would have been proper to test 
against the mean square for error. if 
he wished to treat the rolls as sam- 
ples of large lots of rolls, it would have 
been proper to test against the mean 
square for rolls within lots. The first 
is highly significant, the second very 
near the 50 per cent point. If there are 





TION 


con- 
more 
ce it 
il, on 
more 
st of 
ough 
ce of 
the 
she 5 
lence 
from 
S. 
ps a 
ition 
nany 
indi- 
. ad- 
nake 
ulti- 
ial!) 
The 
Vari- 
ngly 
mine 
the 
d or 
rmal 
uate 


dis- 


Five 
ance 
, of 
ving 
late. 
The 
rari- 
rari- 
ected 
) us, 
tion 
inds 
can 
ted” 


the 
had 
mp- 
nine 
lots 
(b) 
lot; 
test 
. if 
am- 
ave 
ean 
first 
rery 
are 





LETTERS ABOUT BOOKS 


many rolls per lot the conclusion would 
be that there are differences from roll 
to roll of statistical significance and 
that these adequately explain the 
observed lot-to-lot variation. Freeman, 
however, tested against a pooled mean 
square for which we know of no 
reasonable interpretation! 

The discussion of classification of 
interaction terms on page 85 seems un- 
satisfactory. It begins with the state- 
ment that “between pots” has no in- 
terest because “the pots were not used 
in any particular sequence.” In their 
original paper (referred to by Free- 
man) Hampton and Gould state that, 
in this series of tests, the pots were se- 
lected to be as different as possible, 
the mean square found (the largest of 
all) suggests a definite systematic 
effect due perhaps to selection of pots 
or to the fact that the same two 
arches were used in all runs. Freeman 
states that “between runs” also means 
little because “the runs are quite inde- 
pendent of each other”—yet they were 
made at different times and contain 
the effects of weather and of the judge- 
ment of the furnace operator as to 
when the temperature should be re- 
duced. The mean square for runs is 
the third largest in the analysis. Pro- 
ceeding on these assumptions, which 
are likely to be incorrect, Freeman is 
then able, by some principle not clear 
to us, to cl: assify together under 
“among cylinders” (page 86) terms 
with mean squares of 4566, 1922 and 
183—by the size of their mean squares 
alone they are of vastly different na- 
ture. Freeman seems to imply that to 
interpret an interaction term it should 
be classified under one or the other 
main effects, a procedure which seems 
to us contrary to the basic principles 
of the analysis of variance. 

No account is given of how the 
analysis of variance technique can be 
applied to the estimation of compo- 
nents of variance. This we have found 
at least as important as the significance 
test use of analysis of variance, and its 
omission we consider particularly un- 
fortunate. The omission is the more 
surprising in that Tippett (upon whom 
the author has drawn for several ex- 
amples) is very clear on the point. 

On page 108, we are told that a cer- 
tain subsidiary variance analysis may 
be useful “for linear regressions calcu- 
lated from grouped data.” Why only 
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from grouped data? The statistician 
may group his data at least as intel- 
ligently as the man who groups it be- 
fore letting the statistician see it. 

On page 74 we have error piled on 
error. All the mean squares are tested 
against the mean square for error, 
where for the conclusions drawn both 
“cut” and “lot” should be tested 
against “interaction.” Second, all the 
F critical values are incorrect—the 
degrees of freedom have been permuted 
in using the table. The correct critical 
value for “interaction vs. error” is 
about 2.31—the interaction is sig- 
nificant. The “lot vs. interaction” test 
is significant and nearly highly sig- 
nificant, while “cut vs. interaction” is 
far from significance. The correct con- 
clusion is that there were significant 
lot differences and significant inter- 
action of between lot and cut, but that 
there was no indication of an average 
effect of direction of cutting. Compare 
these with Freeman’s conclusions. 

On page 56 Freeman states that “ 
is unlikely ... that just five machines 
would be used and that each would be 
used exactly once with each grid and 
each operator. If these conditions are 
not satisfied, machine effects cannot be 
removed.” Nonorthogonal analysis of 
variance is not easy to analyze or in- 
terpret—but it is an essential tool in 
deciphering incomplete experiments. 

REGRESSION. The treatment of re- 
gression is unsatisfactory in a number 
of respects. Some of these have been- 
referred to elsewhere, in particular the 
use of a significance test as a criterion 
of “adequacy” of a regression equation. 

Perhaps the most serious confusion 
exists on the question of the choice of 
independent variable. On page 113, it 
is stated that this choice “depends 
not on what we would like to predict, 
but on which of the two variables, b 4 
and Y, is free from error.” The author 
goes on to say that if X is free from 
error, and we wish to predict X from 
Y, we should obtain the regression of 
Y on X, and then solve this for X. 
But on page 99, he treats a problem in 
which a measurement Y is accurate 
but costly and destructive, and it is 
desired to substitute a cheaper but 
less acurate nondestructive measure- 
ment X. To solve this problem he ob- 
tains the regression of Y on X. 

This is not the place for a complete 
discussion of this problem; but it 
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should be pointed out that the state- 
ment on page 113 is wrong, and that 
the proper regression was fitted on page 
99. It may be suspected that the state- 
ment on page 113 arose out of a mis- 
reading of a paper by Eisenhart to 
which reference is made. Eisenhart 
was concerned with the problem which 
arises when the values of one variable 
have been arbitrarily fixed. In this 
case, of course, only one regression line 
is available. 

THE PHILOSOPHY OF EXPERIMENTA- 
TION. On page 5, Freeman indicates 
that it would have been better to omit 
“muck soil” from the pipe experiment 
if the experimenter knew in advance 
that it behaved differently from the 
other soils, because “its inclusion is 
uninformative and the loss in precision 
is costly.” This conclusion is clearly 
not generally valid, for the real pur- 
pose of this experiment will often be 
to serve as a guide in choosing pipe 
rather than in estimating the average 
effect. In the situation envisaged here 
it would probably be wise to allot not 
1, but 3 or 4 of the small number of 
pairs available to different mucky soils 
and to agree in advance that the 
mucky and nonmucky groups be 
analyzed separately. 

On page 7, Freeman states that the 
intelligent experimenter will always 
use a method involving a variable 
rather than a method involving cate- 
gories. Some attention must certainly 
be given to the relative difficulties of 
obtaining equal sized samples before 
an adequate answer can be given. 

On page 67, we are told that the use 
of the Latin Square “wastes” degrees 
of freedom which could otherwise be 
allotted to error. This question arises 
not only for Latin Squares, but in 
every design situation where some 
degrees of freedom have been allotted 
to some variable of potential impor- 
tance or effect. It seems to us that no 
waste is involved; if the mean square 
for this additional variable is about the 
size of the error, the average practi- 
tioner will, wisely or unwisely, pool 
these together with no loss of degrees 
of freedom, while if the mean square 
is large, it has been worthwhile by 
reducing the size of the error mean 
square. 

BASIC ASSUMPTIONS. On page 44 we 
are told that the estimate of co? based 
on the sum of squares of deviations 
from the sample mean is “best” in the 
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sense of minimum variance. This oc- 
curs in a section discussing the un- 
biased nature of this estimate (Free- 
man introduces the excellent term 
“mean estimate” here), where every 
result except this one about minimum 
variance applies to arbitrary distribu- 
tions. There is no warning that mini- 
mum variance is known only for the 
normal distribution, while for other 
distributions, e.g., the double expo- 
nential, other estimates have minimum 
variance. The same dangerous state- 
ment is applied to two samples at the 
top of page 47. 

At the foot of page 98 we are told 
that if we draw chips (each bearing X 
and Y) from a bowl, and if the cor- 
relation between X and Y is zero, then 
the ordinary regression analysis leads 
to a quantity with the F distribution. 
This 1s, of course, only known to be 
true when the distribution of X and Y 
is bivariate normal, and is false when, 
for example, their distribution is bi- 
variate uniform. 

On page 106 we are told that “It is 
here assumed that the variances of all 
eight columns are equivalent, within 
the limits of chance variation. This as- 
sumption, which may be checked by 
the LZ; test, must be met for the F test 
to be valid.” It seems to us that any 
interpretation that a student is likely 
to put on this is wrong. For example, 
these misinterpretations seem likely: 
(a) Unless the sample variances of the 
eight columns meet the L;, test, the F 
test is invalid. (6) If the population 
variances of the eight columns are 
nearly enough equal to allow the sam- 
ple to pass the ZL, test, the F test is 
exact. The proper interpretation, of 
course, would be: (c) The F test is 
only exact when the eight column 
populations are normal with the same 
variance; we may be able to detect 
large deviations from equality in the 
sample; the F test is not badly affected 
by lack of equality; we will not make 
too many serious mistakes if we use the 
F test, checking suspicious cases with 
the L, test and watering down our 
conclusions when the lack of homo- 
geneity seems large. 

CONTROL. On pages 131-32, we are 
introduced to a definition of control 
which seems to us widely different 
from that employed in practice. Im- 
plicitly it seems to state that a process 
is in control when its output is ran- 
dom. In Shewhart’s original discus- 
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sion, he made it clear that he defined 
control economically, not statistically. 
Control meant that it was not eco- 
nomic to try to remove further causes 
of variation, which would almost cer- 
tainly exist and cause non-randomness 
in any practical state of control. The 
control chart procedure was developed 
empirically to help to attain this state 
—and was to be used both before and 
after control was reached. Yet on page 
133 we are told that the type of “‘sys- 
tematic quality control” discussed in 
this chapter cannot be used until 
population homogeneity has _ been 
reached. On this internal evidence, 
then, these methods are only useful for 
the, possibly uneconomic, refinement 
of a quality control program set up by 
more useful methods. 

In passing we note the equality be- 
tween sample and population values 
which page 133 promises “wiil be 
shown” later. We presume that an 
averaging operation has been neg- 
lected. 

The table on page 134 offered an 
alert expositor the chance to point 
out the change in level of defectives 
between the 6th and 7th days. Any 
plot of p would make this obvious. 
It seems poor pedagogy to point out 
that the results are not homogeneous, 
and not to take the data from the 7th 
day on and show how they behave. 
Noting the error of 9.8 for 7.8, six of 
the 7 values of “p ... in following 
sample” on page 135 are below the 
control limits. The reader should be 
told how suspicious this is. 

MINOR POINTS. In general there 
seem to be unduly many significant 
figures. On page 13, for a sample of 15, 
we are given an estimate of o? with 6 
“significant figures;” surely three could 
have been dropped without loss. But 
on page 117—we have a mean square 
on 2 df given to 10 “significant fig- 
ures,” where 3 could be regarded as 
slightly excessive. This is more inter- 
esting since, in the same table, a mean 
square on 57 df is given to three less 
“significant figures.” This seems quite 
illogical. 

The exponent of 4 instead of 1 for 
the denominator of the estimate of py, 
on page 32 might be annoying. On page 
69, the last parenthesis should be 
squared. 

Joun W. Tukey, Department of 
Mathematics Princeton Univer- 
sity 
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CHaRLEs P. Winsor, Depart- 
ment of Biostatistics, School of 
Hygiene and Public Health, 
Johns Hopkins University 


ELEMENTARY STATISTICAL 
METHODS 


LEMENTARY Statistical Methods as 

Applied to Business and Economic 
Data (New York: Macmillan Co., 
1943) by William Addison Neiswanger 
is an elementary text designed espe- 
cially for use in a first-semester course. 
The theory of statistics is illustrated 
with concrete examples. Formulas are 
given without proof. This is, of course, 
necessary because an economics fresh- 
man would not have enough mathe- 
matical training to follow the deriva- 
tions of the formulas. But the author 
has gone too far in distorting the the- 
ory of statistics. 

In regard to normal distribution, the 
author said (page 209), “This normal 
distribution is also referred to as ‘the 
bell-shaped curve,’ ‘the curve of er- 
rors,’ ‘the binomial distribution,’ all 
of which, while having technical dif- 
ferences as to meaning, are commonly 
used interchangeably when referring 
to this distribution.” Since the author 
realizes that these terms have technical 
differences, he should discourage rather 
than encourage their misuse. It is es- 
pecially absurd to call a normal distri- 
bution a binomial distribution. The 
author states repeatedly (page 249), 
“normal in the Gaussian sense.” Does 
a normal distribution have any other 
sense? 

In discussing the merits of the dif- 
ferent kinds of averages such as the 
mean and the median, the author 
failed to mention the most important 
property of the mean, i.e., the mean 
generally is a more efficient statistic 
than the median. 

The author called (p. 340) the coef- 
ficient of variation “a pure number 
which describes relative variation.” 
It is true that this coefficient is inde- 
pendent of unit, but he did not men- 
tion the fact that the coefficient of vari- 
ation depends on the origin which is 
an undesirable property of this coef- 
ficient. A mean can be zero and indeed 
can be a negative number. 

The author defined (p. 346) the 
standard error of the mean as “the 
standard deviation of a series of means 
computed from large samples, drawn 
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at random from a single, homogeneous 
universe.” 

First, the size of the samples, large 
or small, has nothing to do with the 
definition of the standard error, but 
the samples must be of the same size. 
Second, the meaning of “a series of 
means” is not clear, but it was inter- 
preted later (p. 349) as the means of 
“a large number of random samples.” 
The number of the samples is quite 
irrelevant to the definition of the 
standard error of the means. The 
“series of means” must be composed 
of all possible means, the number of 
which may be large or small. 

The author did not give the exact 
relation between the standard error 
of the mean and the standard devia- 
tion of the population which is 
ou =a/WVN. Instead, he gave an equa- 
tion (p. 348) showing the relation of 
the standard error of the mean and the 
standard deviation of the sample 
without saying that it is only an ap- 
proximate relation. 

In testing the significance of the 
difference between two sample means, 
the author states the null hypothesis 
as this (p. 357): “Our problem here is 
to determine whether both might 
reasonably be viewed as having come 
from one and the same parent popula- 
tion or universe,” but the standard 
error of the difference between the 
sample means is given as 


o.,? re o.,? 
as at” aot 


where ¢,,? and o,,? are the variances of 
the samples and not the unbiased 
estimates of the population variance. 
If the null hypothesis is that both 
samples are drawn from the same pop- 
ulation, a pooled estimate of the popu- 
lation variance should be used. This 
formula would be correct if the null 
hypothesis were that the samples are 
drawn from two different populations 
with the same mean. Throughout the 
discussion, the author failed to draw 
a distinction between the two different 
null hypotheses. 

The presentation of “Analysis of 
Functional Relationship” is very awk- 
ward, The regression does not deal 
with the analysis of the functional re- 
lationship between two variables z 
and y, as the author suggested. It deals 
with the functional relationship be- 
- roe 2 and the means of the arrays 
of y. 
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The author spent 45 pages in discus- 
sing the linear regression, but he did 
not even give the definition of the line 
of regression. Intentionally or unin- 
tentionally the author gave the impres- 
sion that the line fitted by the method 
of least squares is the line of regression. 
He did not say that the method of 
least squares is only a method of esti- 
mating the line of regression of the 
population. In fact, he cannot possibly 
make such a statement because the 
specific population is given at the end 
rather than at the beginning of the 
chapter on linear regression. 

The population was specified as fol- 
lows (p. 639): “The most important 
assumptions are four in number: (1) 
The assumption of linearity. (2) The 
assumption of normality. (3) The as- 
sumption of independent events. (4) 
The assumption that Y is a function 
of X.” The important assumption of 
homoscedasticity is not included. 

The assumption of linearity is er- 
roneously explained as “... straight 
lines will describe the functional rela- 
tion between the variables.” The as- 
sumption of linearity is known to the 
writer as the locus of the means of the 
arrays of y being a straight line. 

For the assumption of normality, 
the author gave a bivariate normal 
distribution (p. 643). This is a sufficient 
assumption, but the assumption that 
z is also normally distributed is not 
necessary. 

Since the assumption of homosce- 
dasticity is not included in the discus- 
sion, the meaning of the standard error 
of estimate has never been made clear. 
The residue sum of squares divided by 
N —2 is the unbiased estimate of the 
variance of the arrays of y. The author 
tried to present this fact (p. 634) by 
quoting a formula from Mordecai 
Ezekiel’s Methods of Correlation Analy- 
sis, 1930, where the unbiased estimate 
is given as S,?(N—1)/(N-—2) but 
Ezekiel’s S,? is the residue sum of 
squares divided by N-—1 (p. 114, 
Ezekiel), while the author’s S,? is the 
residue sum of squares divided by N. 
Therefore, the correction should be 
N/(N —2) rather than (N —1)/(N —2). 

This can hardly be called a good 
book, not because it is worse than some 
of the existing elementary statistics 
books, but because it shows no im- 
provement over them. 

JEROME C. R. Li, Instructor in 
Mathematics, Oregon State Col- 
lege 
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Pp. 32. Paper. 
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Karl J. Holzinger (Professor of Education). 
(University of Chicago.) Sch R 54(6) :364-7 
Je '46.* [636 

Mathematics as a Tool in the Water- 
Works Laboratory. Harold A. Thomas, Jr. 
(Assistant Professor of Sanitary Engineer- 
ing, Harvard University). J New England 
Water Works Assn 60(2):153-61 Je '46.* 

[637 

Tables for Testing the Homogeneity of 
a Set of Estimated Variances. Computed 
by Catherine M. Thompson and Maxine 
Merrington. Prefatory note (pp. 296-301) 
by H. O. Hartley (Scientific Computing 
Service, Ltd., 23 Bedford Sq., London 
W.C.1, England) and E. S. Pearson (De- 
partment of Statistics, University College, 
London W.C.1). Biometrika 33(4) : 296-304 
Je '46.* [638 

A Single Plane Method of Rotation. 
L. L. Thurstone (Professor of Psychology, 
University of Chicago). Psychometrika 
11(2):71-9 Je °46.* [639 

General Solution of the Analysis of Vari- 
ance and Covariance in the Case of Unequal 
or Disproportionate Numbers of Observa- 
tions in the Subclasses. Fei Tsao (National 
Central University, Chungking, China). 
Psychometrika 11(2):107-28 Je '46.* [640 

A Normalized Graphic Method of Item 
Analysis. William W. Turnbull (Head, Test 
Construction Department, College En- 
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trance Examination Board, Princeton, 
N. J.). J Ed Psychol 37(3):129-41 Mr '46.* 


(641 
Quality Control Cuts Costs. James Tur- 
ner. Am Bus 16(6):14—-5 Je '46.* (642 


Minimal Variance and Its Relation to 
Efficient Moment Tests. J. R. Vatnsdal 
(Associate Professor of Mathematics, State 
College of Washington). Ann Math Stat 
17(2): 198-207 Je '46.* [643 

The Probability Distribution of the Meas- 
ure of a Random Linear Set. David F. 
Votaw, Jr. (Naval Ordnance Laboratory). 
Ann Math Stat 17(2):240-4 Je '46.* [644 

Incomplete-Block Design Adapted to 
Paired Tests of Mosquito Repellents. F. M. 
Wadley (Statistical Consultant, Bureau of 
Entomology and Plant Quarantine, U. S. 
Department of Agriculture, Washington). 
Biometrics B 2(2):30-1 Ap '46.* (645 

Tolerance Limits for a Normal Distribu- 
tion. A. Wald (Professor of Mathematical 
Statistics) and J. Wolfowitz (Associate Pro- 
fessor of Mathematical Statistics). (Colum- 
bia University.) Ann Math Stat 17(2): 
208-15 Je '46.* (646 

Some Order Statistic Distributions for 
Samples of Size Four. John E. Walsh (De- 
partment of Mathematics, Princeton Uni- 
versity). Ann Math Stat 17(2):246-8 Je 
'46.* [647 

Yule’s “Characteristic” and the “Index 
of Diversity.” Letter. C. B. Williams 
(Rothamsted Experimental Station, Har- 
penden, Herts, England). Nature 157(3989) 
:482 Ap 13 '46.* (648 

The Problem of Probability: A Sympo- 
sium on Probability, Part III. Donald 
Williams (Associate Professor of Philosophy, 
Harvard University). Philos & Phenom 
Res 6(4):619-22 Je '46.* Reply to 614 and 
615; see also 567, 606. (649 

A Simple Test of Significance. Letter. 
E. J. Williams (P. O. Box 18, South Mel- 
bourne, Victoria, Australia). Eng 161(4193) : 
496 My 24 '46.* A criticism of 407. [650 

Inequalities in Terms of Mean Range. 
C. B. Winsten. Biometrika 33(4):283-95 
Je '46.* [651 

Should Engineers Study Statistics? 
Holbrook Working (Professor of Prices and 
Statistics, Stanford University). J Eng Ed 
36(9) :557-64 My '46.* {652 


PERIODICALS REPRESENTED FOR THE First TIME 


Am Bus—American Business. 12 issues; $3 


(35¢); Dartnell Publications, Inc., 4660 


Ravenswood Ave., Chicago 40, IIl. 


Am J Pub Health—American Journal of 


Public Health. 12 issues; $5(50¢); 1790 
Broadway, New York 19, N. Y. 

Auto & Aviation Ind—Automotive and 
Aviation Industries. 24 issues in 2 vols.; 
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$1(25¢); Chilton Co., Chestnut & 56th 
Sts., Philadelphia 39, Pa. 

Beama J—Beama Journal. 12 issues; (1s.); 
British Electrical & Allied Manufactur- 
ers’ Association (Inc.), 36 Kingsway, 
London, W.C.2, England. 

Ed & Psychol Meas—Educational and 
Psychological Measurement. 4 issues; 


$4($1.25); 917 15th St., N.W., Washing- 
ton 5, D. C. 

J Account—The Journal of Accountancy. 
12 issues in 2 vols.; $4; 13 East 41st St., 
New York, N. Y. 

J New England Water Works Assn—QJour- 


AMERICAN STATISTICAL ASSOCIATION 


nal of the New England Water Works 
Association. 4 issues; $4($1.25); 609 
Statler Bldg., Boston, Mass. 

Proc Indian Acad Sci, Sect B— Proceedings 
of the Indian Academy of Science, Section 
B. 4 issues; Rs. 18 (Rs. 2 or 3s); Indian 
Academy of Sciences, Bangalore, India. 

Proc Royal Irish Academy, Sect A—Pro- 
ceedings of the Royal Irish Academy of 
Sciences. 

SAE J—SAE Journal. 12 issues; $10($1); 
Society of Automotive Engineers, 29 
West 39th St., New York 18, N. Y. 





dings 
ction 





